-
Are Current Survival Prediction Tools Useful When Treating Subsequent Skeletal-related Events From Bone Metastases?
Bone metastasis in advanced cancer is challenging because of pain, functional issues, and reduced life expectancy. Treatment planning is complex, with consideration of factors such as location, symptoms, and prognosis. Prognostic models help guide treatment choices, with Skeletal Oncology Research Group machine-learning algorithms (SORG-MLAs) showing promise in predicting survival for initial spinal metastases and extremity metastases treated with surgery or radiotherapy. Improved therapies extend patient lifespans, increasing the risk of subsequent skeletal-related events (SREs). Patients experiencing subsequent SREs often suffer from disease progression, indicating a deteriorating condition. For these patients, a thorough evaluation, including accurate survival prediction, is essential to determine the most appropriate treatment and avoid aggressive surgical treatment for patients with a poor survival likelihood. Patients experiencing subsequent SREs often suffer from disease progression, indicating a deteriorating condition. However, some variables in the SORG prediction model, such as tumor histology, visceral metastasis, and previous systemic therapies, might remain consistent between initial and subsequent SREs. Given the prognostic difference between patients with and without a subsequent SRE, the efficacy of established prognostic models-originally designed for individuals with an initial SRE-in addressing a subsequent SRE remains uncertain. Therefore, it is crucial to verify the model's utility for subsequent SREs.
We aimed to evaluate the reliability of the SORG-MLAs for survival prediction in patients undergoing surgery or radiotherapy for a subsequent SRE for whom both the initial and subsequent SREs occurred in the spine or extremities.
We retrospectively included 738 patients who were 20 years or older who received surgery or radiotherapy for initial and subsequent SREs at a tertiary referral center and local hospital in Taiwan between 2010 and 2019. We excluded 74 patients whose initial SRE was in the spine and in whom the subsequent SRE occurred in the extremities and 37 patients whose initial SRE was in the extremities and the subsequent SRE was in the spine. The rationale was that different SORG-MLAs were exclusively designed for patients who had an initial spine metastasis and those who had an initial extremity metastasis, irrespective of whether they experienced metastatic events in other areas (for example, a patient experiencing an extremity SRE before his or her spinal SRE would also be regarded as a candidate for an initial spinal SRE). Because these patients were already validated in previous studies, we excluded them in case we overestimated our result. Five patients with malignant primary bone tumors and 38 patients in whom the metastasis's origin could not be identified were excluded, leaving 584 patients for analysis. The 584 included patients were categorized into two subgroups based on the location of initial and subsequent SREs: the spine group (68% [399]) and extremity group (32% [185]). No patients were lost to follow-up. Patient data at the time they presented with a subsequent SRE were collected, and survival predictions at this timepoint were calculated using the SORG-MLAs. Multiple imputation with the Missforest technique was conducted five times to impute the missing proportions of each predictor. The effectiveness of SORG-MLAs was gauged through several statistical measures, including discrimination (measured by the area under the receiver operating characteristic curve [AUC]), calibration, overall performance (Brier score), and decision curve analysis. Discrimination refers to the model's ability to differentiate between those with the event and those without the event. An AUC ranges from 0.5 to 1.0, with 0.5 indicating the worst discrimination and 1.0 indicating perfect discrimination. An AUC of 0.7 is considered clinically acceptable discrimination. Calibration is the comparison between the frequency of observed events and the predicted probabilities. In an ideal calibration, the observed and predicted survival rates should be congruent. The logarithm of observed-to-expected survival ratio [log(O:E)] offers insight into the model's overall calibration by considering the total number of observed (O) and expected (E) events. The Brier score measures the mean squared difference between the predicted probability of possible outcomes for each individual and the observed outcomes, ranging from 0 to 1, with 0 indicating perfect overall performance and 1 indicating the worst performance. Moreover, the prevalence of the outcome should be considered, so a null-model Brier score was also calculated by assigning a probability equal to the prevalence of the outcome (in this case, the actual survival rate) to each patient. The benefit of the prediction model is determined by comparing its Brier score with that of the null model. If a prediction model's Brier score is lower than the null model's Brier score, the prediction model is deemed as having good performance. A decision curve analysis was performed for models to evaluate the "net benefit," which weighs the true positive rate over the false positive rate against the "threshold probabilities," the ratio of risk over benefit after an intervention was derived based on a comprehensive clinical evaluation and a well-discussed shared-decision process. A good predictive model should yield a higher net benefit than default strategies (treating all patients and treating no patients) across a range of threshold probabilities.
For the spine group, the algorithms displayed acceptable AUC results (median AUCs of 0.69 to 0.72) for 42-day, 90-day, and 1-year survival predictions after treatment for a subsequent SRE. In contrast, the extremity group showed median AUCs ranging from 0.65 to 0.73 for the corresponding survival periods. All Brier scores were lower than those of their null model, indicating the SORG-MLAs' good overall performances for both cohorts. The SORG-MLAs yielded a net benefit for both cohorts; however, they overestimated 1-year survival probabilities in patients with a subsequent SRE in the spine, with a median log(O:E) of -0.60 (95% confidence interval -0.77 to -0.42).
The SORG-MLAs maintain satisfactory discriminatory capacity and offer considerable net benefits through decision curve analysis, indicating their continued viability as prediction tools in this clinical context. However, the algorithms overestimate 1-year survival rates for patients with a subsequent SRE of the spine, warranting consideration of specific patient groups. Clinicians and surgeons should exercise caution when using the SORG-MLAs for survival prediction in these patients and remain aware of potential mispredictions when tailoring treatment plans, with a preference for less invasive treatments. Ultimately, this study emphasizes the importance of enhancing prognostic algorithms and developing innovative tools for patients with subsequent SREs as the life expectancy in patients with bone metastases continues to improve and healthcare providers will encounter these patients more often in daily practice.
Level III, prognostic study.
Pan YT
,Lin YP
,Yen HK
,Yen HH
,Huang CC
,Hsieh HC
,Janssen S
,Hu MH
,Lin WH
,Groot OQ
... -
《-》
-
Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.
Survival estimation for patients with symptomatic skeletal metastases ideally should be made before a type of local treatment has already been determined. Currently available survival prediction tools, however, were generated using data from patients treated either operatively or with local radiation alone, raising concerns about whether they would generalize well to all patients presenting for assessment. The Skeletal Oncology Research Group machine-learning algorithm (SORG-MLA), trained with institution-based data of surgically treated patients, and the Metastases location, Elderly, Tumor primary, Sex, Sickness/comorbidity, and Site of radiotherapy model (METSSS), trained with registry-based data of patients treated with radiotherapy alone, are two of the most recently developed survival prediction models, but they have not been tested on patients whose local treatment strategy is not yet decided.
(1) Which of these two survival prediction models performed better in a mixed cohort made up both of patients who received local treatment with surgery followed by radiotherapy and who had radiation alone for symptomatic bone metastases? (2) Which model performed better among patients whose local treatment consisted of only palliative radiotherapy? (3) Are laboratory values used by SORG-MLA, which are not included in METSSS, independently associated with survival after controlling for predictions made by METSSS?
Between 2010 and 2018, we provided local treatment for 2113 adult patients with skeletal metastases in the extremities at an urban tertiary referral academic medical center using one of two strategies: (1) surgery followed by postoperative radiotherapy or (2) palliative radiotherapy alone. Every patient's survivorship status was ascertained either by their medical records or the national death registry from the Taiwanese National Health Insurance Administration. After applying a priori designated exclusion criteria, 91% (1920) were analyzed here. Among them, 48% (920) of the patients were female, and the median (IQR) age was 62 years (53 to 70 years). Lung was the most common primary tumor site (41% [782]), and 59% (1128) of patients had other skeletal metastases in addition to the treated lesion(s). In general, the indications for surgery were the presence of a complete pathologic fracture or an impending pathologic fracture, defined as having a Mirels score of ≥ 9, in patients with an American Society of Anesthesiologists (ASA) classification of less than or equal to IV and who were considered fit for surgery. The indications for radiotherapy were relief of pain, local tumor control, prevention of skeletal-related events, and any combination of the above. In all, 84% (1610) of the patients received palliative radiotherapy alone as local treatment for the target lesion(s), and 16% (310) underwent surgery followed by postoperative radiotherapy. Neither METSSS nor SORG-MLA was used at the point of care to aid clinical decision-making during the treatment period. Survival was retrospectively estimated by these two models to test their potential for providing survival probabilities. We first compared SORG to METSSS in the entire population. Then, we repeated the comparison in patients who received local treatment with palliative radiation alone. We assessed model performance by area under the receiver operating characteristic curve (AUROC), calibration analysis, Brier score, and decision curve analysis (DCA). The AUROC measures discrimination, which is the ability to distinguish patients with the event of interest (such as death at a particular time point) from those without. AUROC typically ranges from 0.5 to 1.0, with 0.5 indicating random guessing and 1.0 a perfect prediction, and in general, an AUROC of ≥ 0.7 indicates adequate discrimination for clinical use. Calibration refers to the agreement between the predicted outcomes (in this case, survival probabilities) and the actual outcomes, with a perfect calibration curve having an intercept of 0 and a slope of 1. A positive intercept indicates that the actual survival is generally underestimated by the prediction model, and a negative intercept suggests the opposite (overestimation). When comparing models, an intercept closer to 0 typically indicates better calibration. Calibration can also be summarized as log(O:E), the logarithm scale of the ratio of observed (O) to expected (E) survivors. A log(O:E) > 0 signals an underestimation (the observed survival is greater than the predicted survival); and a log(O:E) < 0 indicates the opposite (the observed survival is lower than the predicted survival). A model with a log(O:E) closer to 0 is generally considered better calibrated. The Brier score is the mean squared difference between the model predictions and the observed outcomes, and it ranges from 0 (best prediction) to 1 (worst prediction). The Brier score captures both discrimination and calibration, and it is considered a measure of overall model performance. In Brier score analysis, the "null model" assigns a predicted probability equal to the prevalence of the outcome and represents a model that adds no new information. A prediction model should achieve a Brier score at least lower than the null-model Brier score to be considered as useful. The DCA was developed as a method to determine whether using a model to inform treatment decisions would do more good than harm. It plots the net benefit of making decisions based on the model's predictions across all possible risk thresholds (or cost-to-benefit ratios) in relation to the two default strategies of treating all or no patients. The care provider can decide on an acceptable risk threshold for the proposed treatment in an individual and assess the corresponding net benefit to determine whether consulting with the model is superior to adopting the default strategies. Finally, we examined whether laboratory data, which were not included in the METSSS model, would have been independently associated with survival after controlling for the METSSS model's predictions by using the multivariable logistic and Cox proportional hazards regression analyses.
Between the two models, only SORG-MLA achieved adequate discrimination (an AUROC of > 0.7) in the entire cohort (of patients treated operatively or with radiation alone) and in the subgroup of patients treated with palliative radiotherapy alone. SORG-MLA outperformed METSSS by a wide margin on discrimination, calibration, and Brier score analyses in not only the entire cohort but also the subgroup of patients whose local treatment consisted of radiotherapy alone. In both the entire cohort and the subgroup, DCA demonstrated that SORG-MLA provided more net benefit compared with the two default strategies (of treating all or no patients) and compared with METSSS when risk thresholds ranged from 0.2 to 0.9 at both 90 days and 1 year, indicating that using SORG-MLA as a decision-making aid was beneficial when a patient's individualized risk threshold for opting for treatment was 0.2 to 0.9. Higher albumin, lower alkaline phosphatase, lower calcium, higher hemoglobin, lower international normalized ratio, higher lymphocytes, lower neutrophils, lower neutrophil-to-lymphocyte ratio, lower platelet-to-lymphocyte ratio, higher sodium, and lower white blood cells were independently associated with better 1-year and overall survival after adjusting for the predictions made by METSSS.
Based on these discoveries, clinicians might choose to consult SORG-MLA instead of METSSS for survival estimation in patients with long-bone metastases presenting for evaluation of local treatment. Basing a treatment decision on the predictions of SORG-MLA could be beneficial when a patient's individualized risk threshold for opting to undergo a particular treatment strategy ranged from 0.2 to 0.9. Future studies might investigate relevant laboratory items when constructing or refining a survival estimation model because these data demonstrated prognostic value independent of the predictions of the METSSS model, and future studies might also seek to keep these models up to date using data from diverse, contemporary patients undergoing both modern operative and nonoperative treatments.
Level III, diagnostic study.
Lee CC
,Chen CW
,Yen HK
,Lin YP
,Lai CY
,Wang JL
,Groot OQ
,Janssen SJ
,Schwab JH
,Hsu FM
,Lin WH
... -
《-》
-
Does the Presence of Missing Data Affect the Performance of the SORG Machine-learning Algorithm for Patients With Spinal Metastasis? Development of an Internet Application Algorithm.
The Skeletal Oncology Research Group machine-learning algorithm (SORG-MLA) was developed to predict the survival of patients with spinal metastasis. The algorithm was successfully tested in five international institutions using 1101 patients from different continents. The incorporation of 18 prognostic factors strengthens its predictive ability but limits its clinical utility because some prognostic factors might not be clinically available when a clinician wishes to make a prediction.
We performed this study to (1) evaluate the SORG-MLA's performance with data and (2) develop an internet-based application to impute the missing data.
A total of 2768 patients were included in this study. The data of 617 patients who were treated surgically were intentionally erased, and the data of the other 2151 patients who were treated with radiotherapy and medical treatment were used to impute the artificially missing data. Compared with those who were treated nonsurgically, patients undergoing surgery were younger (median 59 years [IQR 51 to 67 years] versus median 62 years [IQR 53 to 71 years]) and had a higher proportion of patients with at least three spinal metastatic levels (77% [474 of 617] versus 72% [1547 of 2151]), more neurologic deficit (normal American Spinal Injury Association [E] 68% [301 of 443] versus 79% [1227 of 1561]), higher BMI (23 kg/m 2 [IQR 20 to 25 kg/m 2 ] versus 22 kg/m 2 [IQR 20 to 25 kg/m 2 ]), higher platelet count (240 × 10 3 /µL [IQR 173 to 327 × 10 3 /µL] versus 227 × 10 3 /µL [IQR 165 to 302 × 10 3 /µL], higher lymphocyte count (15 × 10 3 /µL [IQR 9 to 21× 10 3 /µL] versus 14 × 10 3 /µL [IQR 8 to 21 × 10 3 /µL]), lower serum creatinine level (0.7 mg/dL [IQR 0.6 to 0.9 mg/dL] versus 0.8 mg/dL [IQR 0.6 to 1.0 mg/dL]), less previous systemic therapy (19% [115 of 617] versus 24% [526 of 2151]), fewer Charlson comorbidities other than cancer (28% [170 of 617] versus 36% [770 of 2151]), and longer median survival. The two patient groups did not differ in other regards. These findings aligned with our institutional philosophy of selecting patients for surgical intervention based on their level of favorable prognostic factors such as BMI or lymphocyte counts and lower levels of unfavorable prognostic factors such as white blood cell counts or serum creatinine level, as well as the degree of spinal instability and severity of neurologic deficits. This approach aims to identify patients with better survival outcomes and prioritize their surgical intervention accordingly. Seven factors (serum albumin and alkaline phosphatase levels, international normalized ratio, lymphocyte and neutrophil counts, and the presence of visceral or brain metastases) were considered possible missing items based on five previous validation studies and clinical experience. Artificially missing data were imputed using the missForest imputation technique, which was previously applied and successfully tested to fit the SORG-MLA in validation studies. Discrimination, calibration, overall performance, and decision curve analysis were applied to evaluate the SORG-MLA's performance. The discrimination ability was measured with an area under the receiver operating characteristic curve. It ranges from 0.5 to 1.0, with 0.5 indicating the worst discrimination and 1.0 indicating perfect discrimination. An area under the curve of 0.7 is considered clinically acceptable discrimination. Calibration refers to the agreement between the predicted outcomes and actual outcomes. An ideal calibration model will yield predicted survival rates that are congruent with the observed survival rates. The Brier score measures the squared difference between the actual outcome and predicted probability, which captures calibration and discrimination ability simultaneously. A Brier score of 0 indicates perfect prediction, whereas a Brier score of 1 indicates the poorest prediction. A decision curve analysis was performed for the 6-week, 90-day, and 1-year prediction models to evaluate their net benefit across different threshold probabilities. Using the results from our analysis, we developed an internet-based application that facilitates real-time data imputation for clinical decision-making at the point of care. This tool allows healthcare professionals to efficiently and effectively address missing data, ensuring that patient care remains optimal at all times.
Generally, the SORG-MLA demonstrated good discriminatory ability, with areas under the curve greater than 0.7 in most cases, and good overall performance, with up to 25% improvement in Brier scores in the presence of one to three missing items. The only exceptions were albumin level and lymphocyte count, because the SORG-MLA's performance was reduced when these two items were missing, indicating that the SORG-MLA might be unreliable without these values. The model tended to underestimate the patient survival rate. As the number of missing items increased, the model's discriminatory ability was progressively impaired, and a marked underestimation of patient survival rates was observed. Specifically, when three items were missing, the number of actual survivors was up to 1.3 times greater than the number of expected survivors, while only 10% discrepancy was observed when only one item was missing. When either two or three items were omitted, the decision curves exhibited substantial overlap, indicating a lack of consistent disparities in performance. This finding suggests that the SORG-MLA consistently generates accurate predictions, regardless of the two or three items that are omitted. We developed an internet application ( https://sorg-spine-mets-missing-data-imputation.azurewebsites.net/ ) that allows the use of SORG-MLA with up to three missing items.
The SORG-MLA generally performed well in the presence of one to three missing items, except for serum albumin level and lymphocyte count (which are essential for adequate predictions, even using our modified version of the SORG-MLA). We recommend that future studies should develop prediction models that allow for their use when there are missing data, or provide a means to impute those missing data, because some data are not available at the time a clinical decision must be made.
The results suggested the algorithm could be helpful when a radiologic evaluation owing to a lengthy waiting period cannot be performed in time, especially in situations when an early operation could be beneficial. It could help orthopaedic surgeons to decide whether to intervene palliatively or extensively, even when the surgical indication is clear.
Huang CC
,Peng KP
,Hsieh HC
,Groot OQ
,Yen HK
,Tsai CC
,Karhade AV
,Lin YP
,Kao YT
,Yang JJ
,Dai SH
,Huang CC
,Chen CW
,Yen MH
,Xiao FR
,Lin WH
,Verlaan JJ
,Schwab JH
,Hsu FM
,Wong T
,Yang RS
,Yang SH
,Hu MH
... -
《-》
-
International Validation of the SORG Machine-learning Algorithm for Predicting the Survival of Patients with Extremity Metastases Undergoing Surgical Treatment.
The Skeletal Oncology Research Group machine-learning algorithms (SORG-MLAs) estimate 90-day and 1-year survival in patients with long-bone metastases undergoing surgical treatment and have demonstrated good discriminatory ability on internal validation. However, the performance of a prediction model could potentially vary by race or region, and the SORG-MLA must be externally validated in an Asian cohort. Furthermore, the authors of the original developmental study did not consider the Eastern Cooperative Oncology Group (ECOG) performance status, a survival prognosticator repeatedly validated in other studies, in their algorithms because of missing data.
(1) Is the SORG-MLA generalizable to Taiwanese patients for predicting 90-day and 1-year mortality? (2) Is the ECOG score an independent factor associated with 90-day and 1-year mortality while controlling for SORG-MLA predictions?
All 356 patients who underwent surgery for long-bone metastases between 2014 and 2019 at one tertiary care center in Taiwan were included. Ninety-eight percent (349 of 356) of patients were of Han Chinese descent. The median (range) patient age was 61 years (25 to 95), 52% (184 of 356) were women, and the median BMI was 23 kg/m2 (13 to 39 kg/m2). The most common primary tumors were lung cancer (33% [116 of 356]) and breast cancer (16% [58 of 356]). Fifty-five percent (195 of 356) of patients presented with a complete pathologic fracture. Intramedullary nailing was the most commonly performed type of surgery (59% [210 of 356]), followed by plate screw fixation (23% [81 of 356]) and endoprosthetic reconstruction (18% [65 of 356]). Six patients were lost to follow-up within 90 days; 30 were lost to follow-up within 1 year. Eighty-five percent (301 of 356) of patients were followed until death or for at least 2 years. Survival was 82% (287 of 350) at 90 days and 49% (159 of 326) at 1 year. The model's performance metrics included discrimination (concordance index [c-index]), calibration (intercept and slope), and Brier score. In general, a c-index of 0.5 indicates random guess and a c-index of 0.8 denotes excellent discrimination. Calibration refers to the agreement between the predicted outcomes and the actual outcomes, with a perfect calibration having an intercept of 0 and a slope of 1. The Brier score of a prediction model must be compared with and ideally should be smaller than the score of the null model. A decision curve analysis was then performed for the 90-day and 1-year prediction models to evaluate their net benefit across a range of different threshold probabilities. A multivariate logistic regression analysis was used to evaluate whether the ECOG score was an independent prognosticator while controlling for the SORG-MLA's predictions. We did not perform retraining/recalibration because we were not trying to update the SORG-MLA algorithm in this study.
The SORG-MLA had good discriminatory ability at both timepoints, with a c-index of 0.80 (95% confidence interval 0.74 to 0.86) for 90-day survival prediction and a c-index of 0.84 (95% CI 0.80 to 0.89) for 1-year survival prediction. However, the calibration analysis showed that the SORG-MLAs tended to underestimate Taiwanese patients' survival (90-day survival prediction: calibration intercept 0.78 [95% CI 0.46 to 1.10], calibration slope 0.74 [95% CI 0.53 to 0.96]; 1-year survival prediction: calibration intercept 0.75 [95% CI 0.49 to 1.00], calibration slope 1.22 [95% CI 0.95 to 1.49]). The Brier score of the 90-day and 1-year SORG-MLA prediction models was lower than their respective null model (0.12 versus 0.16 for 90-day prediction; 0.16 versus 0.25 for 1-year prediction), indicating good overall performance of SORG-MLAs at these two timepoints. Decision curve analysis showed SORG-MLAs provided net benefits when threshold probabilities ranged from 0.40 to 0.95 for 90-day survival prediction and from 0.15 to 1.0 for 1-year prediction. The ECOG score was an independent factor associated with 90-day mortality (odds ratio 1.94 [95% CI 1.01 to 3.73]) but not 1-year mortality (OR 1.07 [95% CI 0.53 to 2.17]) after controlling for SORG-MLA predictions for 90-day and 1-year survival, respectively.
SORG-MLAs retained good discriminatory ability in Taiwanese patients with long-bone metastases, although their actual survival time was slightly underestimated. More international validation and incremental value studies that address factors such as the ECOG score are warranted to refine the algorithms, which can be freely accessed online at https://sorg-apps.shinyapps.io/extremitymetssurvival/.
Level III, therapeutic study.
Tseng TE
,Lee CC
,Yen HK
,Groot OQ
,Hou CH
,Lin SY
,Bongers MER
,Hu MH
,Karhade AV
,Ko JC
,Lai YH
,Yang JJ
,Verlaan JJ
,Yang RS
,Schwab JH
,Lin WH
... -
《-》
-
Does the SORG Machine-learning Algorithm for Extremity Metastases Generalize to a Contemporary Cohort of Patients? Temporal Validation From 2016 to 2020.
The ability to predict survival accurately in patients with osseous metastatic disease of the extremities is vital for patient counseling and guiding surgical intervention. We, the Skeletal Oncology Research Group (SORG), previously developed a machine-learning algorithm (MLA) based on data from 1999 to 2016 to predict 90-day and 1-year survival of surgically treated patients with extremity bone metastasis. As treatment regimens for oncology patients continue to evolve, this SORG MLA-driven probability calculator requires temporal reassessment of its accuracy.
Does the SORG-MLA accurately predict 90-day and 1-year survival in patients who receive surgical treatment for a metastatic long-bone lesion in a more recent cohort of patients treated between 2016 and 2020?
Between 2017 and 2021, we identified 674 patients 18 years and older through the ICD codes for secondary malignant neoplasm of bone and bone marrow and CPT codes for completed pathologic fractures or prophylactic treatment of an impending fracture. We excluded 40% (268 of 674) of patients, including 18% (118) who did not receive surgery; 11% (72) who had metastases in places other than the long bones of the extremities; 3% (23) who received treatment other than intramedullary nailing, endoprosthetic reconstruction, or dynamic hip screw; 3% (23) who underwent revision surgery, 3% (17) in whom there was no tumor, and 2% (15) who were lost to follow-up within 1 year. Temporal validation was performed using data on 406 patients treated surgically for bony metastatic disease of the extremities from 2016 to 2020 at the same two institutions where the MLA was developed. Variables used to predict survival in the SORG algorithm included perioperative laboratory values, tumor characteristics, and general demographics. To assess the models' discrimination, we computed the c-statistic, commonly referred to as the area under the receiver operating characteristic (AUC) curve for binary classification. This value ranged from 0.5 (representing chance-level performance) to 1.0 (indicating excellent discrimination) Generally, an AUC of 0.75 is considered high enough for use in clinical practice. To evaluate the agreement between predicted and observed outcomes, a calibration plot was used, and the calibration slope and intercept were calculated. Perfect calibration would result in a slope of 1 and intercept of 0. For overall performance, the Brier score and null-model Brier score were determined. The Brier score can range from 0 (representing perfect prediction) to 1 (indicating the poorest prediction). Proper interpretation of the Brier score necessitates a comparison with the null-model Brier score, which represents the score for an algorithm that predicts a probability equal to the population prevalence of the outcome for each patient. Finally, a decision curve analysis was conducted to compare the potential net benefit of the algorithm with other decision-support methods, such as treating all or none of the patients. Overall, 90-day and 1-year mortality were lower in the temporal validation cohort than in the development cohort (90 day: 23% versus 28%; p < 0.001, and 1 year: 51% versus 59%; p<0.001).
Overall survival of the patients in the validation cohort improved from 28% mortality at the 90-day timepoint in the cohort on which the model was trained to 23%, and 59% mortality at the 1-year timepoint to 51%. The AUC was 0.78 (95% CI 0.72 to 0.82) for 90-day survival and 0.75 (95% CI 0.70 to 0.79) for 1-year survival, indicating the model could distinguish the two outcomes reasonably. For the 90-day model, the calibration slope was 0.71 (95% CI 0.53 to 0.89), and the intercept was -0.66 (95% CI -0.94 to -0.39), suggesting the predicted risks were overly extreme, and that in general, the risk of the observed outcome was overestimated. For the 1-year model, the calibration slope was 0.73 (95% CI 0.56 to 0.91) and the intercept was -0.67 (95% CI -0.90 to -0.43). With respect to overall performance, the model's Brier scores for the 90-day and 1-year models were 0.16 and 0.22. These scores were higher than the Brier scores of internal validation of the development study (0.13 and 0.14) models, indicating the models' performance has declined over time.
The SORG MLA to predict survival after surgical treatment of extremity metastatic disease showed decreased performance on temporal validation. Moreover, in patients undergoing innovative immunotherapy, the possibility of mortality risk was overestimated in varying severity. Clinicians should be aware of this overestimation and discount the prediction of the SORG MLA according to their own experience with this patient population. Generally, these results show that temporal reassessment of these MLA-driven probability calculators is of paramount importance because the predictive performance may decline over time as treatment regimens evolve. The SORG-MLA is available as a freely accessible internet application at https://sorg-apps.shinyapps.io/extremitymetssurvival/ .Level of Evidence Level III, prognostic study.
de Groot TM
,Ramsey D
,Groot OQ
,Fourman M
,Karhade AV
,Twining PK
,Berner EA
,Fenn BP
,Collins AK
,Raskin K
,Lozano S
,Newman E
,Ferrone M
,Doornberg JN
,Schwab JH
... -
《-》