Hepatitis/Fatty Liver Disease ; Biomarkers of Liver Fibrosis

Thursday, June 9, 2011

Hepatitis/Fatty Liver Disease ; Biomarkers of Liver Fibrosis

From Journal of Gastroenterology and Hepatology

Biomarkers of Liver Fibrosis
Leon A Adams

Authors and Disclosures

Posted: 06/09/2011; J Gastroenterol Hepatol. 2011;26(5):802-809. © 2011 Blackwell Publishing

Abstract and Introduction
Abstract
Fibrosis prediction is an essential part of the assessment and management of patients with chronic liver disease. Blood-based biomarkers offer a number of advantages over the traditional standard of fibrosis assessment of liver biopsy, including safety, cost-savings and wide spread accessibility. Current biomarker algorithms include indirect surrogate measures of fibrosis, including aminotransaminases and platelet count, or direct measures of fibrinogenesis or fibrinolysis such as hyaluronic acid and tissue inhibitor of metalloproteinase-1. A number of algorithms have now been validated across a range of chronic liver disease including chronic viral hepatitis, alcoholic and non-alcoholic fatty liver disease. Furthermore, several models have been demonstrated to be dynamic to changes in fibrosis over time and are predictive of liver-related survival and overall survival to a greater degree than liver biopsy. Current limitations of biomarker models include a significant indeterminate range, and a predictive ability that is limited to only a few stages of fibrosis. Utilization of these biomarker models requires knowledge of patient co-morbidities which may produce false positive or negative results in a small proportion of individuals. Furthermore, knowledge of the underlying prevalence of fibrosis in the patient population is required for interpretation of the positive or negative predictive values of a test result. Novel proteins identified by proteomic technology and genetic polymorphisms from genome association studies offer the possibility for further refinement and individualization of biomarker fibrosis models in the future.

Introduction
Chronic liver disease is characterised by progressive hepatic fibrosis which may accumulate and lead to the formation of cirrhosis with attendant complications of portal hypertension, hepatic synthetic impairment and hepatocellular carcinoma. Therefore, determination of the degree of hepatic fibrosis is routine for the assessment of patients with chronic liver disease, as it is the key determinant of prognosis. Subsequently, the severity of fibrosis frequently dictates the need and timing of therapy, screening and surveillance strategies, and may also define treatment response. Historically, liver biopsy was the only method of determining hepatic fibrosis. However its well-acknowledged limitations have led to the search for alternative, non-invasive methods for fibrosis assessment, including clinical and serum biomarker algorithms. The number of serum biomarker algorithms for liver fibrosis has increased significantly over the past decade, and they are beginning to be incorporated into routine clinical practice. This review will examine the development, accuracy, clinical utility and pitfalls of biomarkers as diagnostic tools to assess hepatic fibrosis.

Pitfalls of Liver Biopsy
Performing a liver biopsy to stage the degree of fibrosis has the added benefits of confirming the etiology of liver disease, assessing potential disease co-factors such as hepatic steatosis, hepatocellular iron and necro-inflammatory activity. Liver biopsy also examines the extent of architectural distortion associated with fibrosis, which is a key element of most histological scoring systems for fibrosis assessment. However, biopsy is obviously invasive with risks of pain, bleeding or perforation (1/1000) and rarely death (1/10 000).[1] Furthermore, liver biopsy is costly, inconvenient and not widely accessible to either patients or physicians. This greatly limits the frequency of utilization and thus precludes accurate fibrosis assessment in the majority of patients with chronic liver disease.

A number of inherent problems also limit the accuracy of liver biopsy in determining fibrosis stage. It has been estimated a standard liver biopsy represents 1/50 000th of the liver, and thus sampling error is a significant problem. Thus, paired biopsy studies have demonstrated discordance of fibrosis stage in 22–37% of non-alcoholic fatty liver disease (NAFLD) biopsies and 33% of hepatitis C biopsies.[2–4] Variability is increased further with small biopsies; fibrosis under-staging occurs in 10% of biopsies 1.5 cm in length compared to 3 cm biopsies.[5] Furthermore, differences in pathologist interpretation further exacerbate inaccuracy of histological fibrosis assessment. For example, the intra- and inter-observer variability for fibrosis in NAFLD is 0.68–0.85 and 0.84, respectively.[3,6,7] Despite these limitations in accuracy, liver biopsy remains the standard against which newer non-invasive methods are developed and compared. It should be noted therefore that due to the variability in liver biopsy, even the perfect non-invasive diagnostic test would not be perfectly replicative of liver biopsy findings. It has been estimated that even if biopsy was 90% sensitive and specific, a perfect biomarker could only obtain a maximum accuracy (or area under the receiver operator characteristic curve [AUC]) of 0.90.[8]

Biomarkers
Biomarkers of fibrosis are typically divided into indirect markers and direct markers of fibrogenesis and fibrinolysis (Table 1). Indirect markers include simple liver function tests, such as aminotransaminases, surrogate measures of portal hypertension, such as platelet count or measures of synthetic impairment such as albumin or pro-thrombin time.[9] Direct measures are more directly involved in the molecular pathogenesis of fibrogenesis and fibrinolysis. They include serum levels of matrix metalloproteinases and hyaluronic acid, or pro-inflammatory and pro-fibrotic cytokines, such as tumour necrosis factor-α (TNF-α) and transforming growth factor-β (TGF-β).[10] These biomarkers are commonly combined together along with clinical risk factors for hepatic fibrosis, such as age, gender or diabetes.[11–13] Recently, genomic and proteomic approaches have expanded the pool of potential biomarker candidates (reviewed below).

Biomarker Development
The development of predictive diagnostic models based upon biomarkers generally follows a similar approach. Potential biomarkers, with or without objective clinical variables, are examined for their association with fibrosis in individuals who have undergone liver biopsy (training set). The degree of fibrosis is frequently dichotomized into significant fibrosis (no/minimal fibrosis vs peri-portal fibrosis/bridging/cirrhosis), advanced fibrosis (no/minimal/peri-portal fibrosis vs bridging fibrosis/cirrhosis), or cirrhosis (absent or present). Multi-variable logistic regression modelling develops a predictive algorithm whose accuracy may be examined by the receiver operator characteristic (ROC) curve which plots sensitivity versus 1-specificity for every possible value of the regression equation. The accuracy of the model is often described by AUC values.

It is essential to validate the predictive model in an independent validation cohort, given that the model has been optimised in the training cohort. The validity of a biomarker model should be assessed by: examining its accuracy in different populations, including those with co-morbid conditions, and cohorts with differing etiologies of liver disease; examining its longitudinal progression over time and responsiveness to treatment; and assessing the ability to predict prognosis in terms of liver-related morbidity and mortality. Furthermore, a variety of quality control laboratory issues need to be addressed, including standardization of analytical methods, assessment of laboratory coefficient of variation, and elucidation of analytical features. The majority of biomarker panels in the literature have not undergone these rigorous steps of evaluation

Biomarker Interpretation
Indeterminate Range
The accuracy of diagnostic biomarker tests is often reported in terms of AUC, with sensitivity, specificity and predictive values calculated at specified cut-points along the range of test values. Typically, a range of values at one end of the test result spectrum will have a high sensitivity and low specificity, whereas a range of values at the opposite end of the test result spectrum will have a low sensitivity and high specificity. Test results in between often have moderate sensitivity and specificity values which are not clinically meaningful, and thus comprise an "indeterminate range." For example, the AST to Platelet Ratio Index (APRI) typically provides a range of results from 0.1–8.0; a cut-off of ≤ 0.5 is 81% sensitive and 50% specific for a diagnosis of significant fibrosis in chronic hepatitis C (CHC), where as a cut-off > 1.5 is 35% sensitive and 91% specific for the diagnosis of significant fibrosis.[14,15] Thus, the majority of biomarker panels will produce inconclusive results for a proportion of patients falling within the indeterminate range for a specific fibrosis end-point.

Parkes and colleagues examined the performance of 10 serum biomarker algorithms and found that 65% of subjects had indeterminate test results for the prediction of significant fibrosis.[16] However, values in the indeterminate range for a specified fibrosis endpoint (e.g. significant fibrosis), may still be useful to diagnose other fibrosis end-points (e.g. cirrhosis). For example, APRI values between 0.5 and 1.5 may be indeterminate for significant fibrosis, however values greater than 1.0 are 89% sensitive and 75% specific for a diagnosis of cirrhosis in CHC patients.[14]

Predictive Values
Sensitivity is the likelihood of correctly determining the presence of disease in the whole population that has the disease (true positive rate) whereas specificity is the likelihood of correctly determining the absence of disease in the whole population that does not have the disease (true negative rate). In a clinical setting when faced with an individual patient, a more useful interpretation of test results is positive predictive value (PPV) and negative predictive value (NPV). These indices describe the probability of having the disease with a positive test result or the probability of not having the disease when a negative test result is obtained.

Predictive values are dependent upon the underlying disease prevalence as well as sensitivity or specificity. Thus a test may be highly specific for the diagnosis of cirrhosis, but have a low PPV if the underlying prevalence is very low (Table 2). For example, an APRI cut-point of 2.0 is 91% specific for the diagnosis of cirrhosis, however the PPV is only 50% if the underlying prevalence of cirrhosis is only 15%.[15] Thus, it is important to realize that biomarker test characteristics will vary according to the setting (and underlying fibrosis prevalence) in which they are used. Studies developing these scores necessarily suffer selection bias as they include only patients who have undergone liver biopsy. As a result, the prevalence of significant fibrosis and cirrhosis is around 45% and 15%, respectively. This is considerably higher than among the general community, where it is estimated that prevalence of significant fibrosis and cirrhosis is, respectively, 2.8% and 0.3%.[15,17,18] Therefore, for a given biomarker, the PPV will be significantly lower and the NPV significantly higher in the general community compared to a tertiary referral liver clinic (Table 2).

Pitfalls in Biomarkers

The Achilles heel of non-invasive markers is the determination of mid-levels of fibrosis (e.g. METAVIR stage 2 or Ishak stages 2–3), where AUC levels are often between 0.7 to 0.8 in independent validation studies (Table 3). The minor differences in absolute fibrosis area between Ishak stages 1 to 3 (Fig. 1) mean that differences in direct fibrogenesis markers are similarly likely to be modest between these stages. In contrast, models have greater accuracy for determining advanced fibrosis and cirrhosis, with AUC levels often > 0.9. Correspondingly, fibrosis area determined by image morphometry in advanced fibrosis or cirrhosis increases dramatically compared to lower fibrosis stages (Fig. 1). Due to the relatively low prevalence of cirrhosis, the PPVs of biomarkers is generally modest. However, the NPVs are generally excellent (> 95%), allowing reliable exclusion of cirrhosis.

Figure 1. Association between Hepatic Fibrosis Area and Fibrosis Stage. The relationship between fibrosis area, as determined by image morphometry, and fibrosis stage (Ishak) is non-linear. There is minimal increase between stages 0–2, and dramatic increase between stages 3–6. Adapted from Standish et al. Gut 2006, 55: 569–578.

Figure 1.

Association between Hepatic Fibrosis Area and Fibrosis Stage. The relationship between fibrosis area, as determined by image morphometry, and fibrosis stage (Ishak) is non-linear. There is minimal increase between stages 0–2, and dramatic increase between stages 3–6. Adapted from Standish et al. Gut 2006, 55: 569–578.

Few biomarkers are specific for hepatic fibrosis alone. As a result, co-morbid conditions which can lead to false positive or negative results need to be excluded prior to interpretation. Generally false-positive results are more common than false-negative ones.^[18] Algorithms which utilize bilirubin (Fibrotest, Hepascore) may be elevated in the presence of hemoloysis, Gilbert's syndrome or biliary obstruction. Algorithms incorporating aminotransaminases (APRI, Fibrometer, Forns test, FIB4) may be falsely positive in acute hepatitis, whereas systemic inflammation from any cause may produce false positive results in acute phase reactants, such as hyaluronate, α-2 macroglobulin, gamma globulin, platelet count, N-terminal pro-collagen peptide (European Liver Fibrosis Panel [ELF], Fibrotest, Hepascore, Fibroindex, Fibrometre, Fibrospect). A multi-variate analysis of discordant results between biopsy and Fibrotest in 537 CHC patients revealed that age, male gender, presence of hepatic steatosis and inflammation were associated with discordant results for fibrosis assessment.^[52

Hepatitis C
Biomarker models have been extensively examined in chronic hepatitis C virus (HCV) infection, and there are now over 15 different published algorithms. Models have been validated in a wide variety of chronic hepatitis C (CHC) patients, including after liver transplantation, with hemodialysis, and HIV/HCV co-infected patients.[53–55] In general, models incorporating direct measures of fibrogenesis (e.g. Fibrotest, Hepascore, ELF) tend to have higher AUC levels for the prediction of significant fibrosis than those based upon indirect measures (e.g. APRI, Forns, FIB-4), though comparative studies have often been underpowered to detect statistically significant differences. Several models including ELF and Fibrometer, have also undergone subsequent modification, limiting conclusions regarding comparative studies using the older algorithms.

The inventors of Fibrometer found their algorithm to be more accurate that Fibrotest, Hepascore, APRI and FIB4 for the determination of significant fibrosis in a large study of 1056 patients.[50] However, a further study of 356 patients failed to find a significant difference in accuracy between Fibrotest, Hepascore, APRI and Fibrometer.[25] In a population of 467 HIV/HCV co-infected patients, Hepascore, Fibrotest and Fibrometer had significantly higher correct classification rates than APRI or FIB4.[56] The accuracy of biomarker models tend to fall in CHC subjects with normal aminotransaminases, particularly those algorithms including aspartate transaminase (AST) or gamma-glutamyltranspeptidase (GGT), such as APRI, Fibroindex, and Forns.[57,58]

A number of algorithms including FIB4, Forns score, ELF, Fibrotest and Fibrospect, all show improvement in score over time in patients who have undergone anti-viral therapy and achieved a sustained virological response. Conversely, values remain unchanged in non-responders, suggesting that they may be dynamic to changes in fibrosis over time.[32,42] Score improvement in models which incorporate ALT and/or AST may be reflective of changes of inflammatory activity rather than fibrosis. Randomized treatment trials of CHC and HIV/HCV co-infection have also demonstrated biomarker models to correlate with changes in liver histology with treatment, falling significantly with fibrosis improvement.[59,60]

The Forns score, APRI, FIB4, ELF and Fibrotest have all been demonstrated to predict liver-related outcomes and/or survival in subjects with chronic liver disease. In this context, they are often more predictive than liver biopsy.[21,31,37] One French study of 537 patients found Fibrotest had greater accuracy at predicting survival without HCV-related death than APRI (AUC 0.96 vs 0.76, P = 0.02).[21]

Hepatitis B
A significant proportion of biomarker models originally developed in hepatitis C patients have subsequently been applied to patients with chronic hepatitis B (CHB) infection. In general, the accuracy of these algorithms is lower compared to that seen in validation studies of CHC patients (Table 3), which may be related to CHB-related necro-inflammatory activity leading to false-positive test results. A number of CHB specific algorithms have also been developed, although some are accurate only in hepatitis B e antigen (HBeAg) positive or HBeAg negative patients, limiting general applicability.[61,62] Similar to that seen in CHC populations, indirect biomarker tests such as APRI and Forns, tend to have lower AUC values than direct tests such as Hepascore or Fibrotest.[27,63]

There are fewer longitudinal and outcome studies for fibrosis biomarkers in CHB. Fibrotest, however, has been demonstrated to be dynamic over time with treatment; a reduction in score correlated with histological improvement in a randomized controlled trial of adefovir therapy.[64] Furthermore, Fibrotest is predictive of survival, and was more accurate than baseline viral load, ALT or APRI in a study of 1074 patients followed for four years.[65]

Nonalcoholic Fatty Liver Disease
Compared with viral hepatitis, there have been relatively few studies examining fibrosis prediction by biomarkers in subjects with NAFLD. In general, the prediction of advanced fibrosis (F3/4) and cirrhosis has been reasonable. However, the prediction of significant fibrosis (F2-4) is often poor, with AUC levels less than 0.7 in several studies.[24,33,66] A number of simple models, such as APRI, AST/ALT ratio, FIB4, BARD (Body mass index, AST/ALT ratio, Diabetes) and the NAFLD fibrosis score (age, hyperglycaemia, body mass index, platelets, albumin, AST/ALT ratio) exclude advanced fibrosis with a > 90% negative predictive value in up to 69% of subjects; they may therefore be useful screening tests.[67] Of these algorithms, FIB4 and the NAFLD fibrosis score are most accurate, with AUC levels between 0.80–0.86 and 0.75–0.81, respectively. They also have a reasonable predictive value (75–82%) for advanced fibrosis.[9,33,66,67] The NAFLD fibrosis score has been most extensively validated. A recent meta-analysis of 12 studies calculating a summary AUC of 0.85 for the determination of advanced fibrosis.[24]

FibrometerNAFLD combines other indirect markers (glucose, AST, ALT, platelet count, ferritin, body weight, age), has high AUC values (> 0.9) and is significantly more accurate for the prediction of significant fibrosis, advanced fibrosis and cirrhosis than APRI.[68] Fibrotest incorporates some direct measures of fibrogenesis. It was developed in CHC, but has similar accuracy in NAFLD, with AUC of 0.81 for significant fibrosis, and 0.88 for advanced fibrosis.[13] The European Liver Fibrosis (ELF) panel incorporates three direct measures of fibrogenesis (age, HA, TIMP-1, PNPIII) with a subsequent modification omitting age. ELF was originally developed in a cohort of 921 subjects with chronic liver disease of differing etiologies. However, the inventors have subsequently validated it in 196 NAFLD patients, with AUC values of 0.82 for moderate fibrosis and 0.90 for advanced fibrosis.[38] Further, independent validation studies are awaited for these algorithms. Currently, no algorithm has been demonstrated to be dynamic over time to changes in fibrosis, nor predictive of liver morbidity and mortality in NAFLD subjects.

Alcoholic Liver Disease (ALD)
Simple algorithms incorporating AST and platelet count are less accurate in alcoholic liver disease due to the effects of alcohol on these indices independent of fibrosis. For example, APRI had an AUC of 0.66–0.70 for significant fibrosis and AUC 0.76 for cirrhosis in a large study of over 600 Veterans Affairs patients.[69] In contrast, algorithms incorporating direct measures appear to be more accurate. A study of 218 patients with ALD demonstrated that Fibrotest, FibrometerA and Hepascore had equivalent accuracy for diagnosing significant fibrosis, and this was significantly better than APRI, Forns and FIB4.[23] Furthermore, after eight years of follow-up, Fibrotest, FibrometerA and Hepascore were similarly more accurate at predicting overall survival in this cohort compared to the other simpler algorithms. Equivalence of accuracy for Fibrotest, FibrometerA and Hepascore in determining significant fibrosis in alcoholic liver disease was confirmed in a small study of 103 subjects from France.[70]

Sequential Algorithms
The recognition that algorithms have different diagnostic strengths has led to the logical step of developing sequential combination algorithms based upon biomarkers, or the combination of biomarker and transient elastography algorithms.[71–73] The majority of these studies have been in CHC patients. A study of 2035 CHC patients using APRI followed by Fibrotest (SAFE algorithm) had AUC for significant fibrosis and cirrhosis of 0.89 and 0.92, respectively, with a NPV of 99–100% for both diagnoses but a PPV of 84% and 56%, respectively.[73] Biopsies would have been avoided in 53.5% and 81.5% for a diagnosis of significant fibrosis and cirrhosis, respectively. A recent prospective comparison of the SAFE algorithm with the combination of transient elastography and Fibrotest in 302 CHC patients found the SAFE algorithm was significantly more accurate than transient elastography/Fibrotest (accuracy 97% vs 87.7%, respectively, P < 0.001), although was able to avoid less biopsies (48% vs 72%, respectively, P < 0.001).[72] For the determination of cirrhosis, SAFE was less accurate than transient elastography/Fibrotest (87% vs 96%, respectively, P < 0.001), although the number of biopsies avoided was similar (75% vs 79%). These studies highlight that combinations of non-invasive biomarkers can reliably diagnose certain degrees of fibrosis with a very high level of accuracy, but this is at the expense of added complexity to their use. These studies also highlight that, even with multiple algorithms, a significant proportion of subjects will be "indeterminate" and require liver biopsy for accurate staging. Proteomics Proteomics is the systematic large scale study of all proteins in an organism.[74] Using tissue or serum-based proteomic technology, a number of novel proteins have been identified to associate with fibrosis in a range of liver diseases.[75–77] As proof of principle, blood protein peak signatures identified by surface-enhanced laser desorption/ionization time-of-flight mass spectrometry or protein chip technology have been demonstrated to be highly predictive (AUC > 0.85) of fibrosis in CHC, CHB and NAFLD.[75–77] The cost and technology involved prohibits routine use of these methods. However, subsequent identification of unique proteins can allow novel algorithms to be created which may be more applicable clinically. For example, a proteomic analysis of microdissected cirrhotic septae and liver parenchymal cells has led to the discovery of novel proteins (microfibril–associated protein-4, Tropomyosin) associated with fibrosis in CHC patients.[78] Subsequent serum analysis demonstrated microfibril-associated protein-4 to be predictive of cirrhosis in patients with CHC and alcoholic liver disease. Confirmation and further evaluation of these novel proteins is required and will undoubtedly be performed.

Genetic Markers
Liver fibrosis results from an interaction between environmental insults, such as viruses or alcohol, and the host response, which is influenced by genetic polymorphisms. High throughput methodologies have allowed assessment of massive amounts of genetic data in relation to hepatic fibrosis. Genome wide and functional genome scans have allowed the detection of single nucleotide polymorphisms (SNPs) in specific genes which are associated with liver fibrosis.[79,80]

Huang and colleagues examined nearly 25 000 SNPs in 1020 CHC patients and found seven gene polymorphisms associated with cirrhosis.[79] A subsequent "cirrhosis risk score" algorithm had an AUC of 0.726 in a validation cohort. This was not significantly higher with the addition of recognized clinical predictors of fibrosis, such as age, gender and alcohol intake. Subsequent independent validation in a cohort of 271 CHC patients revealed the cirrhosis risk score to be significantly predictive of fibrosis progression.[81] Thus, current genetic scores are not more accurate than conventional biomarkers. In the future, the clinical utility of genetic risk scores may be to predict patients who will develop future liver fibrosis progression and resultant morbidity and mortality. Further studies are required before integration of these tools into clinical practice.

Conclusions
Prediction of hepatic fibrosis by biomarker models has developed rapidly over the past decade, with a multitude of algorithms created and subsequently commercialized. Of the plethora of models, only a few have been rigorously evaluated in terms of laboratory variation, or have undergone validation by groups independent of their development. However, it has now been demonstrated that biomarker models are accurate across a range of chronic liver diseases, are dynamic over time to changes in fibrosis, and can be used to predict liver-related morbidity, mortality and overall survival. Limitations remain. These include the significant proportion of cases which are indeterminate, and the ability to only detect binomial outcomes such as the presence or absence of cirrhosis. Despite this, biomarker models are gaining popularity in routine clinical practice as a useful tool for patient management.

What remains to be demonstrated is whether the use of biomarker models can influence patient outcomes. Interpretation of these models in the clinical setting requires careful attention to co-morbid conditions which may result in falsely positive or negative results, as well as knowledge about the population prevalence of fibrosis (pre-test probability) which affects the predictive values of test results. Proteomic techniques are being used to identify novel biomarkers which offer the potential to further increase the accuracy and clinical utility of fibrosis biomarker models. In addition, such research can provide additional insight into the pathogenesis of liver fibrosis. Application of genomic medicine to the field of fibrosis prediction has highlighted the variation in genetic susceptibility and fibrosis rates between individuals. Further refinement of genetic risk scores, and their incorporation with more routinely available fibrosis biomarkers, offers the potential to individualize fibrosis risk prediction, thereby offering powerful prognostic tools for liver morbidity and mortality.