Liver fibrosis stage based on the four factors (FIB-4) score or Forns index in adults with chronic hepatitis C.
The presence and severity of liver fibrosis are important prognostic variables when evaluating people with chronic hepatitis C (CHC). Although liver biopsy remains the reference standard, non-invasive serological markers, such as the four factors (FIB-4) score and the Forns index, can also be used to stage liver fibrosis.
To determine the diagnostic accuracy of the FIB-4 score and Forns index in staging liver fibrosis in people with chronic hepatitis C (CHC) virus, using liver biopsy as the reference standard (primary objective). To compare the diagnostic accuracy of these tests for staging liver fibrosis in people with CHC and explore potential sources of heterogeneity (secondary objectives).
We used standard Cochrane search methods for diagnostic accuracy studies (search date: 13 April 2022).
We included diagnostic cross-sectional or case-control studies that evaluated the performance of the FIB-4 score, the Forns index, or both, against liver biopsy, in the assessment of liver fibrosis in participants with CHC. We imposed no language restrictions. We excluded studies in which: participants had causes of liver disease besides CHC; participants had successfully been treated for CHC; or the interval between the index test and liver biopsy exceeded six months.
Two review authors independently extracted data. We performed meta-analyses using the bivariate model and calculated summary estimates. We evaluated the performance of both tests for three target conditions: significant fibrosis or worse (METAVIR stage ≥ F2); severe fibrosis or worse (METAVIR stage ≥ F3); and cirrhosis (METAVIR stage F4). We restricted the meta-analysis to studies reporting cut-offs in a specified range (+/-0.15 for FIB-4; +/-0.3 for Forns index) around the original validated cut-offs (1.45 and 3.25 for FIB-4; 4.2 and 6.9 for Forns index). We calculated the percentage of people who would receive an indeterminate result (i.e. above the rule-out threshold but below the rule-in threshold) for each index test/cut-off/target condition combination.
We included 84 studies (with a total of 107,583 participants) from 28 countries, published between 2002 and 2021, in the qualitative synthesis. Of the 84 studies, 82 (98%) were cross-sectional diagnostic accuracy studies with cohort-based sampling, and the remaining two (2%) were case-control studies. All studies were conducted in referral centres. Our main meta-analysis included 62 studies (100,605 participants). Overall, two studies (2%) had low risk of bias, 23 studies (27%) had unclear risk of bias, and 59 studies (73%) had high risk of bias. We judged 13 studies (15%) to have applicability concerns regarding participant selection. FIB-4 score The FIB-4 score's low cut-off (1.45) is designed to rule out people with at least severe fibrosis (≥ F3). Thirty-nine study cohorts (86,907 participants) yielded a summary sensitivity of 81.1% (95% confidence interval (CI) 75.6% to 85.6%), specificity of 62.3% (95% CI 57.4% to 66.9%), and negative likelihood ratio (LR-) of 0.30 (95% CI 0.24 to 0.38). The FIB-4 score's high cut-off (3.25) is designed to rule in people with at least severe fibrosis (≥ F3). Twenty-four study cohorts (81,350 participants) yielded a summary sensitivity of 41.4% (95% CI 33.0% to 50.4%), specificity of 92.6% (95% CI 89.5% to 94.9%), and positive likelihood ratio (LR+) of 5.6 (95% CI 4.4 to 7.1). Using the FIB-4 score to assess severe fibrosis and applying both cut-offs together, 30.9% of people would obtain an indeterminate result, requiring further investigations. We report the summary accuracy estimates for the FIB-4 score when used for assessing significant fibrosis (≥ F2) and cirrhosis (F4) in the main review text. Forns index The Forns index's low cut-off (4.2) is designed to rule out people with at least significant fibrosis (≥ F2). Seventeen study cohorts (4354 participants) yielded a summary sensitivity of 84.7% (95% CI 77.9% to 89.7%), specificity of 47.9% (95% CI 38.6% to 57.3%), and LR- of 0.32 (95% CI 0.25 to 0.41). The Forns index's high cut-off (6.9) is designed to rule in people with at least significant fibrosis (≥ F2). Twelve study cohorts (3245 participants) yielded a summary sensitivity of 34.1% (95% CI 26.4% to 42.8%), specificity of 97.3% (95% CI 92.9% to 99.0%), and LR+ of 12.5 (95% CI 5.7 to 27.2). Using the Forns index to assess significant fibrosis and applying both cut-offs together, 44.8% of people would obtain an indeterminate result, requiring further investigations. We report the summary accuracy estimates for the Forns index when used for assessing severe fibrosis (≥ F3) and cirrhosis (F4) in the main text. Comparing FIB-4 to Forns index There were insufficient studies to meta-analyse the performance of the Forns index for diagnosing severe fibrosis and cirrhosis. Therefore, comparisons of the two tests' performance were not possible for these target conditions. For diagnosing significant fibrosis and worse, there were no significant differences in their performance when using the high cut-off. The Forns index performed slightly better than FIB-4 when using the low/rule-out cut-off (relative sensitivity 1.12, 95% CI 1.00 to 1.25; P = 0.0573; relative specificity 0.69, 95% CI 0.57 to 0.84; P = 0.002).
Both the FIB-4 score and the Forns index may be considered for the initial assessment of people with CHC. The FIB-4 score's low cut-off (1.45) can be used to rule out people with at least severe fibrosis (≥ F3) and cirrhosis (F4). The Forns index's high cut-off (6.9) can be used to diagnose people with at least significant fibrosis (≥ F2). We judged most of the included studies to be at unclear or high risk of bias. The overall quality of the body of evidence was low or very low, and more high-quality studies are needed. Our review only captured data from referral centres. Therefore, when generalising our results to a primary care population, the probability of false positives will likely be higher and false negatives will likely be lower. More research is needed in sub-Saharan Africa, since these tests may be of value in such resource-poor settings.
Huttman M
,Parigi TL
,Zoncapè M
,Liguori A
,Kalafateli M
,Noel-Storr AH
,Casazza G
,Tsochatzis E
... -
《Cochrane Database of Systematic Reviews》
The effect of sample site and collection procedure on identification of SARS-CoV-2 infection.
Sample collection is a key driver of accuracy in the diagnosis of SARS-CoV-2 infection. Viral load may vary at different anatomical sampling sites and accuracy may be compromised by difficulties obtaining specimens and the expertise of the person taking the sample. It is important to optimise sampling accuracy within cost, safety and accessibility constraints.
To compare the sensitivity of different sampling collection sites and methods for the detection of current SARS-CoV-2 infection with any molecular or antigen-based test.
Electronic searches of the Cochrane COVID-19 Study Register and the COVID-19 Living Evidence Database from the University of Bern (which includes daily updates from PubMed and Embase and preprints from medRxiv and bioRxiv) were undertaken on 22 February 2022. We included independent evaluations from national reference laboratories, FIND and the Diagnostics Global Health website. We did not apply language restrictions.
We included studies of symptomatic or asymptomatic people with suspected SARS-CoV-2 infection undergoing testing. We included studies of any design that compared results from different sample types (anatomical location, operator, collection device) collected from the same participant within a 24-hour period.
Within a sample pair, we defined a reference sample and an index sample collected from the same participant within the same clinical encounter (within 24 hours). Where the sample comparison was different anatomical sites, the reference standard was defined as a nasopharyngeal or combined naso/oropharyngeal sample collected into the same sample container and the index sample as the alternative anatomical site. Where the sample comparison was concerned with differences in the sample collection method from the same site, we defined the reference sample as that closest to standard practice for that sample type. Where the sample pair comparison was concerned with differences in personnel collecting the sample, the more skilled or experienced operator was considered the reference sample. Two review authors independently assessed the risk of bias and applicability concerns using the QUADAS-2 and QUADAS-C checklists, tailored to this review. We present estimates of the difference in the sensitivity (reference sample (%) minus index sample sensitivity (%)) in a pair and as an average across studies for each index sampling method using forest plots and tables. We examined heterogeneity between studies according to population (age, symptom status) and index sample (time post-symptom onset, operator expertise, use of transport medium) characteristics.
This review includes 106 studies reporting 154 evaluations and 60,523 sample pair comparisons, of which 11,045 had SARS-CoV-2 infection. Ninety evaluations were of saliva samples, 37 nasal, seven oropharyngeal, six gargle, six oral and four combined nasal/oropharyngeal samples. Four evaluations were of the effect of operator expertise on the accuracy of three different sample types. The majority of included evaluations (146) used molecular tests, of which 140 used RT-PCR (reverse transcription polymerase chain reaction). Eight evaluations were of nasal samples used with Ag-RDTs (rapid antigen tests). The majority of studies were conducted in Europe (35/106, 33%) or the USA (27%) and conducted in dedicated COVID-19 testing clinics or in ambulatory hospital settings (53%). Targeted screening or contact tracing accounted for only 4% of evaluations. Where reported, the majority of evaluations were of adults (91/154, 59%), 28 (18%) were in mixed populations with only seven (4%) in children. The median prevalence of confirmed SARS-CoV-2 was 23% (interquartile (IQR) 13%-40%). Risk of bias and applicability assessment were hampered by poor reporting in 77% and 65% of included studies, respectively. Risk of bias was low across all domains in only 3% of evaluations due to inappropriate inclusion or exclusion criteria, unclear recruitment, lack of blinding, nonrandomised sampling order or differences in testing kit within a sample pair. Sixty-eight percent of evaluation cohorts were judged as being at high or unclear applicability concern either due to inflation of the prevalence of SARS-CoV-2 infection in study populations by selectively including individuals with confirmed PCR-positive samples or because there was insufficient detail to allow replication of sample collection. When used with RT-PCR • There was no evidence of a difference in sensitivity between gargle and nasopharyngeal samples (on average -1 percentage points, 95% CI -5 to +2, based on 6 evaluations, 2138 sample pairs, of which 389 had SARS-CoV-2). • There was no evidence of a difference in sensitivity between saliva collection from the deep throat and nasopharyngeal samples (on average +10 percentage points, 95% CI -1 to +21, based on 2192 sample pairs, of which 730 had SARS-CoV-2). • There was evidence that saliva collection using spitting, drooling or salivating was on average -12 percentage points less sensitive (95% CI -16 to -8, based on 27,253 sample pairs, of which 4636 had SARS-CoV-2) compared to nasopharyngeal samples. We did not find any evidence of a difference in the sensitivity of saliva collected using spitting, drooling or salivating (sensitivity difference: range from -13 percentage points (spit) to -21 percentage points (salivate)). • Nasal samples (anterior and mid-turbinate collection combined) were, on average, 12 percentage points less sensitive compared to nasopharyngeal samples (95% CI -17 to -7), based on 9291 sample pairs, of which 1485 had SARS-CoV-2. We did not find any evidence of a difference in sensitivity between nasal samples collected from the mid-turbinates (3942 sample pairs) or from the anterior nares (8272 sample pairs). • There was evidence that oropharyngeal samples were, on average, 17 percentage points less sensitive than nasopharyngeal samples (95% CI -29 to -5), based on seven evaluations, 2522 sample pairs, of which 511 had SARS-CoV-2. A much smaller volume of evidence was available for combined nasal/oropharyngeal samples and oral samples. Age, symptom status and use of transport media do not appear to affect the sensitivity of saliva samples and nasal samples. When used with Ag-RDTs • There was no evidence of a difference in sensitivity between nasal samples compared to nasopharyngeal samples (sensitivity, on average, 0 percentage points -0.2 to +0.2, based on 3688 sample pairs, of which 535 had SARS-CoV-2).
When used with RT-PCR, there is no evidence for a difference in sensitivity of self-collected gargle or deep-throat saliva samples compared to nasopharyngeal samples collected by healthcare workers when used with RT-PCR. Use of these alternative, self-collected sample types has the potential to reduce cost and discomfort and improve the safety of sampling by reducing risk of transmission from aerosol spread which occurs as a result of coughing and gagging during the nasopharyngeal or oropharyngeal sample collection procedure. This may, in turn, improve access to and uptake of testing. Other types of saliva, nasal, oral and oropharyngeal samples are, on average, less sensitive compared to healthcare worker-collected nasopharyngeal samples, and it is unlikely that sensitivities of this magnitude would be acceptable for confirmation of SARS-CoV-2 infection with RT-PCR. When used with Ag-RDTs, there is no evidence of a difference in sensitivity between nasal samples and healthcare worker-collected nasopharyngeal samples for detecting SARS-CoV-2. The implications of this for self-testing are unclear as evaluations did not report whether nasal samples were self-collected or collected by healthcare workers. Further research is needed in asymptomatic individuals, children and in Ag-RDTs, and to investigate the effect of operator expertise on accuracy. Quality assessment of the evidence base underpinning these conclusions was restricted by poor reporting. There is a need for further high-quality studies, adhering to reporting standards for test accuracy studies.
Davenport C
,Arevalo-Rodriguez I
,Mateos-Haro M
,Berhane S
,Dinnes J
,Spijker R
,Buitrago-Garcia D
,Ciapponi A
,Takwoingi Y
,Deeks JJ
,Emperador D
,Leeflang MMG
,Van den Bruel A
,Cochrane COVID-19 Diagnostic Test Accuracy Group
... -
《Cochrane Database of Systematic Reviews》
Accuracy of routine laboratory tests to predict mortality and deterioration to severe or critical COVID-19 in people with SARS-CoV-2.
Identifying patients with COVID-19 disease who will deteriorate can be useful to assess whether they should receive intensive care, or whether they can be treated in a less intensive way or through outpatient care. In clinical care, routine laboratory markers, such as C-reactive protein, are used to assess a person's health status.
To assess the accuracy of routine blood-based laboratory tests to predict mortality and deterioration to severe or critical (from mild or moderate) COVID-19 in people with SARS-CoV-2.
On 25 August 2022, we searched the Cochrane COVID-19 Study Register, encompassing searches of various databases such as MEDLINE via PubMed, CENTRAL, Embase, medRxiv, and ClinicalTrials.gov. We did not apply any language restrictions.
We included studies of all designs that produced estimates of prognostic accuracy in participants who presented to outpatient services, or were admitted to general hospital wards with confirmed SARS-CoV-2 infection, and studies that were based on serum banks of samples from people. All routine blood-based laboratory tests performed during the first encounter were included. We included any reference standard used to define deterioration to severe or critical disease that was provided by the authors.
Two review authors independently extracted data from each included study, and independently assessed the methodological quality using the Quality Assessment of Prognostic Accuracy Studies tool. As studies reported different thresholds for the same test, we used the Hierarchical Summary Receiver Operator Curve model for meta-analyses to estimate summary curves in SAS 9.4. We estimated the sensitivity at points on the SROC curves that corresponded to the median and interquartile range boundaries of specificities in the included studies. Direct and indirect comparisons were exclusively conducted for biomarkers with an estimated sensitivity and 95% CI of ≥ 50% at a specificity of ≥ 50%. The relative diagnostic odds ratio was calculated as a summary of the relative accuracy of these biomarkers.
We identified a total of 64 studies, including 71,170 participants, of which 8169 participants died, and 4031 participants deteriorated to severe/critical condition. The studies assessed 53 different laboratory tests. For some tests, both increases and decreases relative to the normal range were included. There was important heterogeneity between tests and their cut-off values. None of the included studies had a low risk of bias or low concern for applicability for all domains. None of the tests included in this review demonstrated high sensitivity or specificity, or both. The five tests with summary sensitivity and specificity above 50% were: C-reactive protein increase, neutrophil-to-lymphocyte ratio increase, lymphocyte count decrease, d-dimer increase, and lactate dehydrogenase increase. Inflammation For mortality, summary sensitivity of a C-reactive protein increase was 76% (95% CI 73% to 79%) at median specificity, 59% (low-certainty evidence). For deterioration, summary sensitivity was 78% (95% CI 67% to 86%) at median specificity, 72% (very low-certainty evidence). For the combined outcome of mortality or deterioration, or both, summary sensitivity was 70% (95% CI 49% to 85%) at median specificity, 60% (very low-certainty evidence). For mortality, summary sensitivity of an increase in neutrophil-to-lymphocyte ratio was 69% (95% CI 66% to 72%) at median specificity, 63% (very low-certainty evidence). For deterioration, summary sensitivity was 75% (95% CI 59% to 87%) at median specificity, 71% (very low-certainty evidence). For mortality, summary sensitivity of a decrease in lymphocyte count was 67% (95% CI 56% to 77%) at median specificity, 61% (very low-certainty evidence). For deterioration, summary sensitivity of a decrease in lymphocyte count was 69% (95% CI 60% to 76%) at median specificity, 67% (very low-certainty evidence). For the combined outcome, summary sensitivity was 83% (95% CI 67% to 92%) at median specificity, 29% (very low-certainty evidence). For mortality, summary sensitivity of a lactate dehydrogenase increase was 82% (95% CI 66% to 91%) at median specificity, 60% (very low-certainty evidence). For deterioration, summary sensitivity of a lactate dehydrogenase increase was 79% (95% CI 76% to 82%) at median specificity, 66% (low-certainty evidence). For the combined outcome, summary sensitivity was 69% (95% CI 51% to 82%) at median specificity, 62% (very low-certainty evidence). Hypercoagulability For mortality, summary sensitivity of a d-dimer increase was 70% (95% CI 64% to 76%) at median specificity of 56% (very low-certainty evidence). For deterioration, summary sensitivity was 65% (95% CI 56% to 74%) at median specificity of 63% (very low-certainty evidence). For the combined outcome, summary sensitivity was 65% (95% CI 52% to 76%) at median specificity of 54% (very low-certainty evidence). To predict mortality, neutrophil-to-lymphocyte ratio increase had higher accuracy compared to d-dimer increase (RDOR (diagnostic Odds Ratio) 2.05, 95% CI 1.30 to 3.24), C-reactive protein increase (RDOR 2.64, 95% CI 2.09 to 3.33), and lymphocyte count decrease (RDOR 2.63, 95% CI 1.55 to 4.46). D-dimer increase had higher accuracy compared to lymphocyte count decrease (RDOR 1.49, 95% CI 1.23 to 1.80), C-reactive protein increase (RDOR 1.31, 95% CI 1.03 to 1.65), and lactate dehydrogenase increase (RDOR 1.42, 95% CI 1.05 to 1.90). Additionally, lactate dehydrogenase increase had higher accuracy compared to lymphocyte count decrease (RDOR 1.30, 95% CI 1.13 to 1.49). To predict deterioration to severe disease, C-reactive protein increase had higher accuracy compared to d-dimer increase (RDOR 1.76, 95% CI 1.25 to 2.50). The neutrophil-to-lymphocyte ratio increase had higher accuracy compared to d-dimer increase (RDOR 2.77, 95% CI 1.58 to 4.84). Lastly, lymphocyte count decrease had higher accuracy compared to d-dimer increase (RDOR 2.10, 95% CI 1.44 to 3.07) and lactate dehydrogenase increase (RDOR 2.22, 95% CI 1.52 to 3.26).
Laboratory tests, associated with hypercoagulability and hyperinflammatory response, were better at predicting severe disease and mortality in patients with SARS-CoV-2 compared to other laboratory tests. However, to safely rule out severe disease, tests should have high sensitivity (> 90%), and none of the identified laboratory tests met this criterion. In clinical practice, a more comprehensive assessment of a patient's health status is usually required by, for example, incorporating these laboratory tests into clinical prediction rules together with clinical symptoms, radiological findings, and patient's characteristics.
De Rop L
,Bos DA
,Stegeman I
,Holtman G
,Ochodo EA
,Spijker R
,Otieno JA
,Alkhlaileh F
,Deeks JJ
,Dinnes J
,Van den Bruel A
,McInnes MD
,Leeflang MM
,Cochrane COVID-19 Diagnostic Test Accuracy Group
,Verbakel JY
... -
《Cochrane Database of Systematic Reviews》