-
Development and validation of a machine learning model to predict the risk of lymph node metastasis in renal carcinoma.
Studies have shown that about 30% of kidney cancer patients will have metastasis, and lymph node metastasis (LNM) may be related to a poor prognosis. Our retrospective study aims to provide a reliable machine learning-based model to predict the occurrence of LNM in kidney cancer. We screened the pathological grade, liver metastasis, M staging, primary site, T staging, and tumor size from the training group (n=39016) formed by the SEER database and the validation group (n=771) formed by the medical center. Independent predictors of LNM in cancer patients. Using six different algorithms to build a prediction model, it is found that the prediction performance of the XGB model in the training group and the validation group is significantly better than any other machine learning model. The results show that prediction tools based on machine learning can accurately predict the probability of LNM in patients with kidney cancer and have satisfactory clinical application prospects.
Lymph node metastasis (LNM) is associated with the prognosis of patients with kidney cancer. This study aimed to provide reliable machine learning-based (ML-based) models to predict the probability of LNM in kidney cancer.
Data on patients diagnosed with kidney cancer were extracted from the Surveillance, Epidemiology and Outcomes (SEER) database from 2010 to 2017, and variables were filtered by least absolute shrinkage and selection operator (LASSO), univariate and multivariate logistic regression analyses. Statistically significant risk factors were used to build predictive models. We used 10-fold cross-validation in the validation of the model. The area under the receiver operating characteristic curve (AUC) was used to assess the performance of the model. Correlation heat maps were used to investigate the correlation of features using permutation analysis to assess the importance of predictors. Probability density functions (PDFs) and clinical utility curves (CUCs) were used to determine clinical utility thresholds.
The training cohort of this study included 39,016 patients, and the validation cohort included 771 patients. In the two cohorts, 2544 (6.5%) and 66 (8.1%) patients had LNM, respectively. Pathological grade, liver metastasis, M stage, primary site, T stage, and tumor size were independent predictive factors of LNM. In both model validation, the XGB model significantly outperformed any of the machine learning models with an AUC value of 0.916.A web calculator (https://share.streamlit.io/liuwencai4/renal_lnm/main/renal_lnm.py) were built based on the XGB model. Based on the PDF and CUC, we suggested 54.6% as a threshold probability for guiding the diagnosis of LNM, which could distinguish about 89% of LNM patients.
The predictive tool based on machine learning can precisely indicate the probability of LNM in kidney cancer patients and has a satisfying application prospect in clinical practice.
Feng X
,Hong T
,Liu W
,Xu C
,Li W
,Yang B
,Song Y
,Li T
,Li W
,Zhou H
,Yin C
... -
《Frontiers in Endocrinology》
-
Interpretable machine learning-based clinical prediction model for predicting lymph node metastasis in patients with intrahepatic cholangiocarcinoma.
Prediction of lymph node metastasis (LNM) for intrahepatic cholangiocarcinoma (ICC) is critical for the treatment regimen and prognosis. We aim to develop and validate machine learning (ML)-based predictive models for LNM in patients with ICC.
A total of 345 patients with clinicopathological characteristics confirmed ICC from Jan 2007 to Jan 2019 were enrolled. The predictors of LNM were identified by the least absolute shrinkage and selection operator (LASSO) and logistic analysis. The selected variables were used for developing prediction models for LNM by six ML algorithms, including Logistic regression (LR), Gradient boosting machine (GBM), Extreme gradient boosting (XGB), Random Forest (RF), Decision tree (DT), Multilayer perceptron (MLP). We applied 10-fold cross validation as internal validation and calculated the average of the areas under the receiver operating characteristic (ROC) curve to measure the performance of all models. A feature selection approach was applied to identify importance of predictors in each model. The heat map was used to investigate the correlation of features. Finally, we established a web calculator using the best-performing model.
In multivariate logistic regression analysis, factors including alcoholic liver disease (ALD), smoking, boundary, diameter, and white blood cell (WBC) were identified as independent predictors for LNM in patients with ICC. In internal validation, the average values of AUC of six models ranged from 0.820 to 0.908. The XGB model was identified as the best model, the average AUC was 0.908. Finally, we established a web calculator by XGB model, which was useful for clinicians to calculate the likelihood of LNM.
The proposed ML-based predicted models had a good performance to predict LNM of patients with ICC. XGB performed best. A web calculator based on the ML algorithm showed promise in assisting clinicians to predict LNM and developed individualized medical plans.
Xie H
,Hong T
,Liu W
,Jia X
,Wang L
,Zhang H
,Xu C
,Zhang X
,Li WL
,Wang Q
,Yin C
,Lv X
... -
《-》
-
A clinical prediction model for predicting the risk of liver metastasis from renal cell carcinoma based on machine learning.
Renal cell carcinoma (RCC) is a highly metastatic urological cancer. RCC with liver metastasis (LM) carries a dismal prognosis. The objective of this study is to develop a machine learning (ML) model that predicts the risk of RCC with LM, which is used to assist clinical treatment.
The retrospective study data of 42,547 patients with RCC were extracted from the Surveillance, Epidemiology, and End Results (SEER) database. ML includes algorithmic methods and is a fast-rising field that has been widely used in the biomedical field. Logistic regression (LR), Gradient Boosting Machine (GBM), Extreme Gradient Boosting (XGB), random forest (RF), decision tree (DT), and naive Bayesian model [Naive Bayes Classifier (NBC)] were applied to develop prediction models to predict the risk of RCC with LM. The six models were 10-fold cross-validated, and the best-performing model was selected based on the area under the curve (AUC) value. A web online calculator was constructed based on the best ML model.
Bone metastasis, lung metastasis, grade, T stage, N stage, and tumor size were independent risk factors for the development of RCC with LM by multivariate regression analysis. In addition, the correlation of the relative proportions of the six clinical variables was shown by a heat map. In the prediction models of RCC with LM, the mean AUC of the XGB model among the six ML algorithms was 0.947. Based on the XGB model, the web calculator (https://share.streamlit.io/liuwencai4/renal_liver/main/renal_liver.py) was developed to evaluate the risk of RCC with LM.
This XGB model has the best predictive effect on RCC with LM. The web calculator constructed based on the XGB model has great potential for clinicians to make clinical decisions and improve the prognosis of RCC patients with LM.
Wang Z
,Xu C
,Liu W
,Zhang M
,Zou J
,Shao M
,Feng X
,Yang Q
,Li W
,Shi X
,Zang G
,Yin C
... -
《Frontiers in Endocrinology》
-
A Machine Learning-Based Predictive Model for Predicting Lymph Node Metastasis in Patients With Ewing's Sarcoma.
In order to provide reference for clinicians and bring convenience to clinical work, we seeked to develop and validate a risk prediction model for lymph node metastasis (LNM) of Ewing's sarcoma (ES) based on machine learning (ML) algorithms.
Clinicopathological data of 923 ES patients from the Surveillance, Epidemiology, and End Results (SEER) database and 51 ES patients from multi-center external validation set were retrospectively collected. We applied ML algorithms to establish a risk prediction model. Model performance was checked using 10-fold cross-validation in the training set and receiver operating characteristic (ROC) curve analysis in external validation set. After determining the best model, a web-based calculator was made to promote the clinical application.
LNM was confirmed or unable to evaluate in 13.86% (135 out of 974) ES patients. In multivariate logistic regression, race, T stage, M stage and lung metastases were independent predictors for LNM in ES. Six prediction models were established using random forest (RF), naive Bayes classifier (NBC), decision tree (DT), xgboost (XGB), gradient boosting machine (GBM), logistic regression (LR). In 10-fold cross-validation, the average area under curve (AUC) ranked from 0.705 to 0.764. In ROC curve analysis, AUC ranged from 0.612 to 0.727. The performance of the RF model ranked best. Accordingly, a web-based calculator was developed (https://share.streamlit.io/liuwencai2/es_lnm/main/es_lnm.py).
With the help of clinicopathological data, clinicians can better identify LNM in ES patients. Risk prediction models established in this study performed well, especially the RF model.
Li W
,Zhou Q
,Liu W
,Xu C
,Tang ZR
,Dong S
,Wang H
,Li W
,Zhang K
,Li R
,Zhang W
,Hu Z
,Shibin S
,Liu Q
,Kuang S
,Yin C
... -
《Frontiers in Medicine》
-
Early distinction of lymph node metastasis in patients with soft tissue sarcoma and individualized survival prediction using the online available nomograms: A population-based analysis.
The presence of metastatic tumor cells in regional lymph nodes is considered as a significant indicator for inferior prognosis. This study aimed to construct some predictive models to quantify the probability of lymph node metastasis (LNM) and survival rate of patients with soft tissue sarcoma (STS) with LNM.
Research data were extracted from the Surveillance, Epidemiology, and End Results (SEER) database between 2004 and 2017, and data of patients with STS from our medical institution were collected to form an external testing set. Univariate and multivariate logistic regression analyses were used to determine the independent risk factors for developing LNM. On the basis of the identified variables, we developed a diagnostic nomogram to predict the risk of LNM in patients with STS. Those patients with STS presenting with LNM were retrieved to build a cohort for identifying the independent prognostic factors through univariate and multivariate Cox regression analysis. Then, two nomograms incorporating the independent prognostic predictors were developed to predict the overall survival (OS) and cancer-specific survival (CSS) for patients with STS with LNM. Kaplan-Meier (K-M) survival analysis was conducted to study the survival difference. Moreover, validations of these nomograms were performed by the receiver operating characteristic curves, the area under the curve, calibration curves, and the decision curve analysis (DCA).
A total of 16,601 patients with STS from the SEER database were enrolled in our study, of which 659 (3.97%) had LNM at the initial diagnosis. K-M survival analysis indicated that patients with LNM had poorer survival rate. Sex, histology, primary site, grade, M stage, and T stage were found to be independently related with development of LNM in patients with STS. Age, grade, histology, M stage, T stage, chemotherapy, radiotherapy, and surgery were identified as the independent prognostic factors for OS of patients with STS with LNM, and age, grade, M stage, T stage, radiotherapy, and surgery were determined as the independent prognostic factors for CSS. Subsequently, we constructed three nomograms, and their online versions are as follows: https://tyxupup.shinyapps.io/probabilityofLNMforSTSpatients/, https://tyxupup.shinyapps.io/OSofSTSpatientswithLNM/, and https://tyxupup.shinyapps.io/CSSofSTSpatientswithLNM/. The areas under the curve (AUCs) of diagnostic nomogram were 0.839 in the training set, 0.811 in the testing set, and 0.852 in the external testing set. For prognostic nomograms, the AUCs of 24-, 36-, and 48-month OS were 0.820, 0.794, and 0.792 in the training set and 0.759, 0.728, and 0.775 in the testing set, respectively; the AUCs of 24-, 36-, and 48-month CSS were 0.793, 0.777, and 0.775 in the training set and 0.775, 0.744, and 0.738 in the testing set, respectively. Furthermore, calibration curves suggested that the predicted values were consistent with the actual values. For the DCA, our nomograms showed a superior net benefit across a wider scale of threshold probabilities for the prediction of risk and survival rate for patients with STS with LNM.
These newly proposed nomograms promise to be useful tools in predicting the risk of LNM for patients with STS and individualized survival prediction for patients with STS with LNM, which may help to guide clinical practice.
Tong Y
,Pi Y
,Cui Y
,Jiang L
,Gong Y
,Zhao D
... -
《-》