-
Diagnosing Bone Metastases in Breast Cancer: A Systematic Review and Network Meta-Analysis on Diagnostic Test Accuracy Studies of 2-[(18)F]FDG-PET/CT, (18)F-NaF-PET/CT, MRI, Contrast-Enhanced CT, and Bone Scintigraphy.
This systematic review and network meta-analysis aimed to compare the diagnostic accuracy of 2-[18F]FDG-PET/CT, 18F-NaF-PET/CT, MRI, contrast-enhanced CT, and bone scintigraphy for diagnosing bone metastases in patients with breast cancer. Following PRISMA-DTA guidelines, we reviewed studies assessing 2-[18F]FDG-PET/CT, 18F-NaF-PET/CT, MRI, contrast-enhanced CT, and bone scintigraphy for diagnosing bone metastases in high-stage primary breast cancer (stage III or IV) or known primary breast cancer with suspicion of recurrence (staging or re-staging). A comprehensive search of MEDLINE/PubMed, Scopus, and Embase was conducted until February 2024. Inclusion criteria were original studies using these imaging methods, excluding those focused on AI/machine learning, primary breast cancer without metastases, mixed cancer types, preclinical studies, and lesion-based accuracy. Preference was given to studies using biopsy or follow-up as the reference standard. Risk of bias was assessed using QUADAS-2. Screening, bias assessment, and data extraction were independently performed by two researchers, with discrepancies resolved by a third. We applied bivariate random-effects models in meta-analysis and network meta-analyzed differences in sensitivity and specificity between the modalities. Forty studies were included, with 29 contributing to the meta-analyses. Of these, 13 studies investigated one single modality only. Both 2-[18F]FDG-PET/CT (sensitivity: 0.94, 95% CI: 0.89-0.97; specificity: 0.98, 95% CI: 0.96-0.99), MRI (0.94, 0.82-0.98; 0.93, 0.87-0.96), and 18F-NaF-PET/CT (0.95, 0.85-0.98; 1, 0.93-1) outperformed the less sensitive modalities CE-CT (0.70, 0.62-0.77; 0.98, 0.97-0.99) and bone scintigraphy (0.83, 0.75-0.88; 0.96, 0.87-0.99). The network meta-analysis of multi-modality studies supports the comparable performance of 2-[18F]FDG-PET/CT and MRI in diagnosing bone metastases (estimated differences in sensitivity and specificity, respectively: 0.01, -0.16 - 0.18; -0.02, -0.15 - 0.12). The results from bivariate random effects modelling and network meta-analysis were consistent for all modalities apart from 18F-NaF-PET/CT. We concluded that 2-[18F]FDG-PET/CT and MRI have high and comparable accuracy for diagnosing bone metastases in breast cancer patients. Both outperformed CE-CT and bone scintigraphy regarding sensitivity. Future multimodality studies based on consented thresholds are warranted for further exploration, especially in terms of the potential role of 18F-NaF-PET/CT in bone metastasis diagnosis in breast cancer.
Gerke O
,Naghavi-Behzad M
,Nygaard ST
,Sigaroudi VR
,Vogsen M
,Vach W
,Hildebrandt MG
... -
《-》
-
Defining the optimum strategy for identifying adults and children with coeliac disease: systematic review and economic modelling.
Elwenspoek MM
,Thom H
,Sheppard AL
,Keeney E
,O'Donnell R
,Jackson J
,Roadevin C
,Dawson S
,Lane D
,Stubbs J
,Everitt H
,Watson JC
,Hay AD
,Gillett P
,Robins G
,Jones HE
,Mallett S
,Whiting PF
... -
《-》
-
Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.
Survival estimation for patients with symptomatic skeletal metastases ideally should be made before a type of local treatment has already been determined. Currently available survival prediction tools, however, were generated using data from patients treated either operatively or with local radiation alone, raising concerns about whether they would generalize well to all patients presenting for assessment. The Skeletal Oncology Research Group machine-learning algorithm (SORG-MLA), trained with institution-based data of surgically treated patients, and the Metastases location, Elderly, Tumor primary, Sex, Sickness/comorbidity, and Site of radiotherapy model (METSSS), trained with registry-based data of patients treated with radiotherapy alone, are two of the most recently developed survival prediction models, but they have not been tested on patients whose local treatment strategy is not yet decided.
(1) Which of these two survival prediction models performed better in a mixed cohort made up both of patients who received local treatment with surgery followed by radiotherapy and who had radiation alone for symptomatic bone metastases? (2) Which model performed better among patients whose local treatment consisted of only palliative radiotherapy? (3) Are laboratory values used by SORG-MLA, which are not included in METSSS, independently associated with survival after controlling for predictions made by METSSS?
Between 2010 and 2018, we provided local treatment for 2113 adult patients with skeletal metastases in the extremities at an urban tertiary referral academic medical center using one of two strategies: (1) surgery followed by postoperative radiotherapy or (2) palliative radiotherapy alone. Every patient's survivorship status was ascertained either by their medical records or the national death registry from the Taiwanese National Health Insurance Administration. After applying a priori designated exclusion criteria, 91% (1920) were analyzed here. Among them, 48% (920) of the patients were female, and the median (IQR) age was 62 years (53 to 70 years). Lung was the most common primary tumor site (41% [782]), and 59% (1128) of patients had other skeletal metastases in addition to the treated lesion(s). In general, the indications for surgery were the presence of a complete pathologic fracture or an impending pathologic fracture, defined as having a Mirels score of ≥ 9, in patients with an American Society of Anesthesiologists (ASA) classification of less than or equal to IV and who were considered fit for surgery. The indications for radiotherapy were relief of pain, local tumor control, prevention of skeletal-related events, and any combination of the above. In all, 84% (1610) of the patients received palliative radiotherapy alone as local treatment for the target lesion(s), and 16% (310) underwent surgery followed by postoperative radiotherapy. Neither METSSS nor SORG-MLA was used at the point of care to aid clinical decision-making during the treatment period. Survival was retrospectively estimated by these two models to test their potential for providing survival probabilities. We first compared SORG to METSSS in the entire population. Then, we repeated the comparison in patients who received local treatment with palliative radiation alone. We assessed model performance by area under the receiver operating characteristic curve (AUROC), calibration analysis, Brier score, and decision curve analysis (DCA). The AUROC measures discrimination, which is the ability to distinguish patients with the event of interest (such as death at a particular time point) from those without. AUROC typically ranges from 0.5 to 1.0, with 0.5 indicating random guessing and 1.0 a perfect prediction, and in general, an AUROC of ≥ 0.7 indicates adequate discrimination for clinical use. Calibration refers to the agreement between the predicted outcomes (in this case, survival probabilities) and the actual outcomes, with a perfect calibration curve having an intercept of 0 and a slope of 1. A positive intercept indicates that the actual survival is generally underestimated by the prediction model, and a negative intercept suggests the opposite (overestimation). When comparing models, an intercept closer to 0 typically indicates better calibration. Calibration can also be summarized as log(O:E), the logarithm scale of the ratio of observed (O) to expected (E) survivors. A log(O:E) > 0 signals an underestimation (the observed survival is greater than the predicted survival); and a log(O:E) < 0 indicates the opposite (the observed survival is lower than the predicted survival). A model with a log(O:E) closer to 0 is generally considered better calibrated. The Brier score is the mean squared difference between the model predictions and the observed outcomes, and it ranges from 0 (best prediction) to 1 (worst prediction). The Brier score captures both discrimination and calibration, and it is considered a measure of overall model performance. In Brier score analysis, the "null model" assigns a predicted probability equal to the prevalence of the outcome and represents a model that adds no new information. A prediction model should achieve a Brier score at least lower than the null-model Brier score to be considered as useful. The DCA was developed as a method to determine whether using a model to inform treatment decisions would do more good than harm. It plots the net benefit of making decisions based on the model's predictions across all possible risk thresholds (or cost-to-benefit ratios) in relation to the two default strategies of treating all or no patients. The care provider can decide on an acceptable risk threshold for the proposed treatment in an individual and assess the corresponding net benefit to determine whether consulting with the model is superior to adopting the default strategies. Finally, we examined whether laboratory data, which were not included in the METSSS model, would have been independently associated with survival after controlling for the METSSS model's predictions by using the multivariable logistic and Cox proportional hazards regression analyses.
Between the two models, only SORG-MLA achieved adequate discrimination (an AUROC of > 0.7) in the entire cohort (of patients treated operatively or with radiation alone) and in the subgroup of patients treated with palliative radiotherapy alone. SORG-MLA outperformed METSSS by a wide margin on discrimination, calibration, and Brier score analyses in not only the entire cohort but also the subgroup of patients whose local treatment consisted of radiotherapy alone. In both the entire cohort and the subgroup, DCA demonstrated that SORG-MLA provided more net benefit compared with the two default strategies (of treating all or no patients) and compared with METSSS when risk thresholds ranged from 0.2 to 0.9 at both 90 days and 1 year, indicating that using SORG-MLA as a decision-making aid was beneficial when a patient's individualized risk threshold for opting for treatment was 0.2 to 0.9. Higher albumin, lower alkaline phosphatase, lower calcium, higher hemoglobin, lower international normalized ratio, higher lymphocytes, lower neutrophils, lower neutrophil-to-lymphocyte ratio, lower platelet-to-lymphocyte ratio, higher sodium, and lower white blood cells were independently associated with better 1-year and overall survival after adjusting for the predictions made by METSSS.
Based on these discoveries, clinicians might choose to consult SORG-MLA instead of METSSS for survival estimation in patients with long-bone metastases presenting for evaluation of local treatment. Basing a treatment decision on the predictions of SORG-MLA could be beneficial when a patient's individualized risk threshold for opting to undergo a particular treatment strategy ranged from 0.2 to 0.9. Future studies might investigate relevant laboratory items when constructing or refining a survival estimation model because these data demonstrated prognostic value independent of the predictions of the METSSS model, and future studies might also seek to keep these models up to date using data from diverse, contemporary patients undergoing both modern operative and nonoperative treatments.
Level III, diagnostic study.
Lee CC
,Chen CW
,Yen HK
,Lin YP
,Lai CY
,Wang JL
,Groot OQ
,Janssen SJ
,Schwab JH
,Hsu FM
,Lin WH
... -
《-》
-
Diagnostic performance of [(68)Ga]DOTATATE PET/CT, [(18)F]FDG PET/CT, MRI of the spine, and whole-body diagnostic CT and MRI in the detection of spinal bone metastases associated with pheochromocytoma and paraganglioma.
To compare the diagnostic performance of [68Ga]DOTATATE PET/CT, [18F]FDG PET/CT, MRI of the spine, and whole-body CT and MRI for the detection of pheochromocytoma/paraganglioma (PPGL)-related spinal bone metastases.
Between 2014 and 2020, PPGL participants with spinal bone metastases prospectively underwent [68Ga]DOTATATE PET/CT, [18F]FDG PET/CT, MRI of the cervical-thoracolumbar spine (MRIspine), contrast-enhanced MRI of the neck and thoraco-abdominopelvic regions (MRIWB), and contrast-enhanced CT of the neck and thoraco-abdominopelvic regions (CTWB). Per-patient and per-lesion detection rates were calculated. Counting of spinal bone metastases was limited to a maximum of one lesion per vertebrae. A composite of all functional and anatomic imaging served as an imaging comparator. The McNemar test compared detection rates between the scans. Two-sided p values were reported.
Forty-three consecutive participants (mean age, 41.7 ± 15.7 years; females, 22) with MRIspine were included who also underwent [68Ga]DOTATATE PET/CT (n = 43), [18F]FDG PET/CT (n = 43), MRIWB (n = 24), and CTWB (n = 33). Forty-one of 43 participants were positive for spinal bone metastases, with 382 lesions on the imaging comparator. [68Ga]DOTATATE PET/CT demonstrated a per-lesion detection rate of 377/382 (98.7%) which was superior compared to [18F]FDG (72.0%, 275/382, p < 0.001), MRIspine (80.6%, 308/382, p < 0.001), MRIWB (55.3%, 136/246, p < 0.001), and CTWB (44.8%, 132/295, p < 0.001). The per-patient detection rate of [68Ga]DOTATATE PET/CT was 41/41 (100%) which was higher compared to [18F]FDG PET/CT (90.2%, 37/41, p = 0.13), MRIspine (97.6%, 40/41, p = 1.00), MRIWB (95.7%, 22/23, p = 1.00), and CTWB (81.8%, 27/33, p = 0.03).
[68Ga]DOTATATE PET/CT should be the modality of choice in PPGL-related spinal bone metastases due to its superior detection rate.
In a prospective study of 43 pheochromocytoma/paraganglioma participants with spinal bone metastases, [68Ga]DOTATATE PET/CT had a superior per-lesion detection rate of 98.7% (377/382), compared to [18F]FDG PET/CT (p < 0.001), MRI of the spine (p < 0.001), whole-body CT (p < 0.001), and whole-body MRI (p < 0.001).
• Data regarding head-to-head comparison between functional and anatomic imaging modalities to detect spinal bone metastases in pheochromocytoma/paraganglioma are limited. • [68Ga]DOTATATE PET/CT had a superior per-lesion detection rate of 98.7% in the detection of spinal bone metastases associated with pheochromocytoma/paraganglioma compared to other imaging modalities: [18]F-FDG PET/CT, MRI of the spine, whole-body CT, and whole-body MRI. • [68Ga]DOTATATE PET/CT should be the modality of choice in the evaluation of spinal bone metastases associated with pheochromocytoma/paraganglioma.
Jha A
,Patel M
,Ling A
,Shah R
,Chen CC
,Millo C
,Nazari MA
,Sinaii N
,Charles K
,Kuo MJM
,Prodanov T
,Saboury B
,Talvacchio S
,Derkyi A
,Del Rivero J
,O'Sullivan Coyne G
,Chen AP
,Nilubol N
,Herscovitch P
,Lin FI
,Taieb D
,Civelek AC
,Carrasquillo JA
,Pacak K
... -
《-》
-
The effect of sample site and collection procedure on identification of SARS-CoV-2 infection.
Sample collection is a key driver of accuracy in the diagnosis of SARS-CoV-2 infection. Viral load may vary at different anatomical sampling sites and accuracy may be compromised by difficulties obtaining specimens and the expertise of the person taking the sample. It is important to optimise sampling accuracy within cost, safety and accessibility constraints.
To compare the sensitivity of different sampling collection sites and methods for the detection of current SARS-CoV-2 infection with any molecular or antigen-based test.
Electronic searches of the Cochrane COVID-19 Study Register and the COVID-19 Living Evidence Database from the University of Bern (which includes daily updates from PubMed and Embase and preprints from medRxiv and bioRxiv) were undertaken on 22 February 2022. We included independent evaluations from national reference laboratories, FIND and the Diagnostics Global Health website. We did not apply language restrictions.
We included studies of symptomatic or asymptomatic people with suspected SARS-CoV-2 infection undergoing testing. We included studies of any design that compared results from different sample types (anatomical location, operator, collection device) collected from the same participant within a 24-hour period.
Within a sample pair, we defined a reference sample and an index sample collected from the same participant within the same clinical encounter (within 24 hours). Where the sample comparison was different anatomical sites, the reference standard was defined as a nasopharyngeal or combined naso/oropharyngeal sample collected into the same sample container and the index sample as the alternative anatomical site. Where the sample comparison was concerned with differences in the sample collection method from the same site, we defined the reference sample as that closest to standard practice for that sample type. Where the sample pair comparison was concerned with differences in personnel collecting the sample, the more skilled or experienced operator was considered the reference sample. Two review authors independently assessed the risk of bias and applicability concerns using the QUADAS-2 and QUADAS-C checklists, tailored to this review. We present estimates of the difference in the sensitivity (reference sample (%) minus index sample sensitivity (%)) in a pair and as an average across studies for each index sampling method using forest plots and tables. We examined heterogeneity between studies according to population (age, symptom status) and index sample (time post-symptom onset, operator expertise, use of transport medium) characteristics.
This review includes 106 studies reporting 154 evaluations and 60,523 sample pair comparisons, of which 11,045 had SARS-CoV-2 infection. Ninety evaluations were of saliva samples, 37 nasal, seven oropharyngeal, six gargle, six oral and four combined nasal/oropharyngeal samples. Four evaluations were of the effect of operator expertise on the accuracy of three different sample types. The majority of included evaluations (146) used molecular tests, of which 140 used RT-PCR (reverse transcription polymerase chain reaction). Eight evaluations were of nasal samples used with Ag-RDTs (rapid antigen tests). The majority of studies were conducted in Europe (35/106, 33%) or the USA (27%) and conducted in dedicated COVID-19 testing clinics or in ambulatory hospital settings (53%). Targeted screening or contact tracing accounted for only 4% of evaluations. Where reported, the majority of evaluations were of adults (91/154, 59%), 28 (18%) were in mixed populations with only seven (4%) in children. The median prevalence of confirmed SARS-CoV-2 was 23% (interquartile (IQR) 13%-40%). Risk of bias and applicability assessment were hampered by poor reporting in 77% and 65% of included studies, respectively. Risk of bias was low across all domains in only 3% of evaluations due to inappropriate inclusion or exclusion criteria, unclear recruitment, lack of blinding, nonrandomised sampling order or differences in testing kit within a sample pair. Sixty-eight percent of evaluation cohorts were judged as being at high or unclear applicability concern either due to inflation of the prevalence of SARS-CoV-2 infection in study populations by selectively including individuals with confirmed PCR-positive samples or because there was insufficient detail to allow replication of sample collection. When used with RT-PCR • There was no evidence of a difference in sensitivity between gargle and nasopharyngeal samples (on average -1 percentage points, 95% CI -5 to +2, based on 6 evaluations, 2138 sample pairs, of which 389 had SARS-CoV-2). • There was no evidence of a difference in sensitivity between saliva collection from the deep throat and nasopharyngeal samples (on average +10 percentage points, 95% CI -1 to +21, based on 2192 sample pairs, of which 730 had SARS-CoV-2). • There was evidence that saliva collection using spitting, drooling or salivating was on average -12 percentage points less sensitive (95% CI -16 to -8, based on 27,253 sample pairs, of which 4636 had SARS-CoV-2) compared to nasopharyngeal samples. We did not find any evidence of a difference in the sensitivity of saliva collected using spitting, drooling or salivating (sensitivity difference: range from -13 percentage points (spit) to -21 percentage points (salivate)). • Nasal samples (anterior and mid-turbinate collection combined) were, on average, 12 percentage points less sensitive compared to nasopharyngeal samples (95% CI -17 to -7), based on 9291 sample pairs, of which 1485 had SARS-CoV-2. We did not find any evidence of a difference in sensitivity between nasal samples collected from the mid-turbinates (3942 sample pairs) or from the anterior nares (8272 sample pairs). • There was evidence that oropharyngeal samples were, on average, 17 percentage points less sensitive than nasopharyngeal samples (95% CI -29 to -5), based on seven evaluations, 2522 sample pairs, of which 511 had SARS-CoV-2. A much smaller volume of evidence was available for combined nasal/oropharyngeal samples and oral samples. Age, symptom status and use of transport media do not appear to affect the sensitivity of saliva samples and nasal samples. When used with Ag-RDTs • There was no evidence of a difference in sensitivity between nasal samples compared to nasopharyngeal samples (sensitivity, on average, 0 percentage points -0.2 to +0.2, based on 3688 sample pairs, of which 535 had SARS-CoV-2).
When used with RT-PCR, there is no evidence for a difference in sensitivity of self-collected gargle or deep-throat saliva samples compared to nasopharyngeal samples collected by healthcare workers when used with RT-PCR. Use of these alternative, self-collected sample types has the potential to reduce cost and discomfort and improve the safety of sampling by reducing risk of transmission from aerosol spread which occurs as a result of coughing and gagging during the nasopharyngeal or oropharyngeal sample collection procedure. This may, in turn, improve access to and uptake of testing. Other types of saliva, nasal, oral and oropharyngeal samples are, on average, less sensitive compared to healthcare worker-collected nasopharyngeal samples, and it is unlikely that sensitivities of this magnitude would be acceptable for confirmation of SARS-CoV-2 infection with RT-PCR. When used with Ag-RDTs, there is no evidence of a difference in sensitivity between nasal samples and healthcare worker-collected nasopharyngeal samples for detecting SARS-CoV-2. The implications of this for self-testing are unclear as evaluations did not report whether nasal samples were self-collected or collected by healthcare workers. Further research is needed in asymptomatic individuals, children and in Ag-RDTs, and to investigate the effect of operator expertise on accuracy. Quality assessment of the evidence base underpinning these conclusions was restricted by poor reporting. There is a need for further high-quality studies, adhering to reporting standards for test accuracy studies.
Davenport C
,Arevalo-Rodriguez I
,Mateos-Haro M
,Berhane S
,Dinnes J
,Spijker R
,Buitrago-Garcia D
,Ciapponi A
,Takwoingi Y
,Deeks JJ
,Emperador D
,Leeflang MMG
,Van den Bruel A
,Cochrane COVID-19 Diagnostic Test Accuracy Group
... -
《Cochrane Database of Systematic Reviews》