-
Combining handcrafted features with latent variables in machine learning for prediction of radiation-induced lung damage.
There has been burgeoning interest in applying machine learning methods for predicting radiotherapy outcomes. However, the imbalanced ratio of a large number of variables to a limited sample size in radiation oncology constitutes a major challenge. Therefore, dimensionality reduction methods can be a key to success. The study investigates and contrasts the application of traditional machine learning methods and deep learning approaches for outcome modeling in radiotherapy. In particular, new joint architectures based on variational autoencoder (VAE) for dimensionality reduction are presented and their application is demonstrated for the prediction of lung radiation pneumonitis (RP) from a large-scale heterogeneous dataset.
A large-scale heterogeneous dataset containing a pool of 230 variables including clinical factors (e.g., dose, KPS, stage) and biomarkers (e.g., single nucleotide polymorphisms (SNPs), cytokines, and micro-RNAs) in a population of 106 nonsmall cell lung cancer (NSCLC) patients who received radiotherapy was used for modeling RP. Twenty-two patients had grade 2 or higher RP. Four methods were investigated, including feature selection (case A) and feature extraction (case B) with traditional machine learning methods, a VAE-MLP joint architecture (case C) with deep learning and lastly, the combination of feature selection and joint architecture (case D). For feature selection, Random forest (RF), Support Vector Machine (SVM), and multilayer perceptron (MLP) were implemented to select relevant features. Specifically, each method was run for multiple times to rank features within several cross-validated (CV) resampled sets. A collection of ranking lists were then aggregated by top 5% and Kemeny graph methods to identify the final ranking for prediction. A synthetic minority oversampling technique was applied to correct for class imbalance during this process. For deep learning, a VAE-MLP joint architecture where a VAE aimed for dimensionality reduction and an MLP aimed for classification was developed. In this architecture, reconstruction loss and prediction loss were combined into a single loss function to realize simultaneous training and weights were assigned to different classes to mitigate class imbalance. To evaluate the prediction performance and conduct comparisons, the area under receiver operating characteristic curves (AUCs) were performed for nested CVs for both handcrafted feature selections and the deep learning approach. The significance of differences in AUCs was assessed using the DeLong test of U-statistics.
An MLP-based method using weight pruning (WP) feature selection yielded the best performance among the different hand-crafted feature selection methods (case A), reaching an AUC of 0.804 (95% CI: 0.761-0.823) with 29 top features. A VAE-MLP joint architecture (case C) achieved a comparable but slightly lower AUC of 0.781 (95% CI: 0.737-0.808) with the size of latent dimension being 2. The combination of handcrafted features (case A) and latent representation (case D) achieved a significant AUC improvement of 0.831 (95% CI: 0.805-0.863) with 22 features (P-value = 0.000642 compared with handcrafted features only (Case A) and P-value = 0.000453 compared to VAE alone (Case C)) with an MLP classifier.
The potential for combination of traditional machine learning methods and deep learning VAE techniques has been demonstrated for dealing with limited datasets in modeling radiotherapy toxicities. Specifically, latent variables from a VAE-MLP joint architecture are able to complement handcrafted features for the prediction of RP and improve prediction over either method alone.
Cui S
,Luo Y
,Tseng HH
,Ten Haken RK
,El Naqa I
... -
《-》
-
Machine learning algorithms for outcome prediction in (chemo)radiotherapy: An empirical comparison of classifiers.
Machine learning classification algorithms (classifiers) for prediction of treatment response are becoming more popular in radiotherapy literature. General Machine learning literature provides evidence in favor of some classifier families (random forest, support vector machine, gradient boosting) in terms of classification performance. The purpose of this study is to compare such classifiers specifically for (chemo)radiotherapy datasets and to estimate their average discriminative performance for radiation treatment outcome prediction.
We collected 12 datasets (3496 patients) from prior studies on post-(chemo)radiotherapy toxicity, survival, or tumor control with clinical, dosimetric, or blood biomarker features from multiple institutions and for different tumor sites, that is, (non-)small-cell lung cancer, head and neck cancer, and meningioma. Six common classification algorithms with built-in feature selection (decision tree, random forest, neural network, support vector machine, elastic net logistic regression, LogitBoost) were applied on each dataset using the popular open-source R package caret. The R code and documentation for the analysis are available online (https://github.com/timodeist/classifier_selection_code). All classifiers were run on each dataset in a 100-repeated nested fivefold cross-validation with hyperparameter tuning. Performance metrics (AUC, calibration slope and intercept, accuracy, Cohen's kappa, and Brier score) were computed. We ranked classifiers by AUC to determine which classifier is likely to also perform well in future studies. We simulated the benefit for potential investigators to select a certain classifier for a new dataset based on our study (pre-selection based on other datasets) or estimating the best classifier for a dataset (set-specific selection based on information from the new dataset) compared with uninformed classifier selection (random selection).
Random forest (best in 6/12 datasets) and elastic net logistic regression (best in 4/12 datasets) showed the overall best discrimination, but there was no single best classifier across datasets. Both classifiers had a median AUC rank of 2. Preselection and set-specific selection yielded a significant average AUC improvement of 0.02 and 0.02 over random selection with an average AUC rank improvement of 0.42 and 0.66, respectively.
Random forest and elastic net logistic regression yield higher discriminative performance in (chemo)radiotherapy outcome and toxicity prediction than other studied classifiers. Thus, one of these two classifiers should be the first choice for investigators when building classification models or to benchmark one's own modeling results against. Our results also show that an informed preselection of classifiers based on existing datasets can improve discrimination over random selection.
Deist TM
,Dankers FJWM
,Valdes G
,Wijsman R
,Hsu IC
,Oberije C
,Lustberg T
,van Soest J
,Hoebers F
,Jochems A
,El Naqa I
,Wee L
,Morin O
,Raleigh DR
,Bots W
,Kaanders JH
,Belderbos J
,Kwint M
,Solberg T
,Monshouwer R
,Bussink J
,Dekker A
,Lambin P
... -
《-》
-
Multi-institutional dose-segmented dosiomic analysis for predicting radiation pneumonitis after lung stereotactic body radiation therapy.
To predict radiation pneumonitis (RP) grade 2 or worse after lung stereotactic body radiation therapy (SBRT) using dose-based radiomic (dosiomic) features.
This multi-institutional study included 247 early-stage nonsmall cell lung cancer patients who underwent SBRT with a prescribed dose of 48-70 Gy at an isocenter between June 2009 and March 2016. Ten dose-volume indices (DVIs) were used, including the mean lung dose, internal target volume size, and percentage of entire lung excluding the internal target volume receiving greater than x Gy (x = 5, 10, 15, 20, 25, 30, 35, and 40). A total of 6,808 dose-segmented dosiomic features, such as shape, first order, and texture features, were extracted from the dose distribution. Patients were randomly partitioned into two groups: model training (70%) and test datasets (30%) over 100 times. Dosiomic features were converted to z-scores (standardized values) with a mean of zero and a standard deviation (SD) of one to put different variables on the same scale. The feature dimension was reduced using the following methods: interfeature correlation based on Spearman's correlation coefficients and feature importance based on a light gradient boosting machine (LightGBM) feature selection function. Three different models were developed using LightGBM as follows: (a) a model with ten DVIs (DVI model), (b) a model with the selected dosiomic features (dosiomic model), and (c) a model with ten DVIs and selected dosiomic features (hybrid model). Suitable hyperparameters were determined by searching the largest average area under the curve (AUC) value in the receiver operating characteristic curve (ROC-AUC) via stratified fivefold cross-validation. Each of the final three models with the closest the ROC-AUC value to the average ROC-AUC value was applied to the test datasets. The classification performance was evaluated by calculating the ROC-AUC, AUC in the precision-recall curve (PR-AUC), accuracy, precision, recall, and f1-score. The entire process was repeated 100 times with randomization, and 100 individual models were developed for each of the three models. Then the mean value and SD for the 100 random iterations were calculated for each performance metric.
Thirty-seven (15.0%) patients developed RP after SBRT. The ROC-AUC and PR-AUC values in the DVI, dosiomic, and hybrid models were 0.660 ± 0.054 and 0.272 ± 0.052, 0.837 ± 0.054 and 0.510 ± 0.115, and 0.846 ± 0.049 and 0.531 ± 0.116, respectively. For each performance metric, the dosiomic and hybrid models outperformed the DVI models (P < 0.05). Texture-based dosiomic feature was confirmed as an effective indicator for predicting RP.
Our dose-segmented dosiomic approach improved the prediction of the incidence of RP after SBRT.
Adachi T
,Nakamura M
,Shintani T
,Mitsuyoshi T
,Kakino R
,Ogata T
,Ono T
,Tanabe H
,Kokubo M
,Sakamoto T
,Matsuo Y
,Mizowaki T
... -
《-》
-
Predicting radiation pneumonitis in locally advanced stage II-III non-small cell lung cancer using machine learning.
Radiation pneumonitis (RP) is a radiotherapy dose-limiting toxicity for locally advanced non-small cell lung cancer (LA-NSCLC). Prior studies have proposed relevant dosimetric constraints to limit this toxicity. Using machine learning algorithms, we performed analyses of contributing factors in the development of RP to uncover previously unidentified criteria and elucidate the relative importance of individual factors.
We evaluated 32 clinical features per patient in a cohort of 203 stage II-III LA-NSCLC patients treated with definitive chemoradiation to a median dose of 66.6 Gy in 1.8 Gy daily fractions at our institution from 2008 to 2016. Of this cohort, 17.7% of patients developed grade ≥2 RP. Univariate analysis was performed using trained decision stumps to individually analyze statistically significant predictors of RP and perform feature selection. Applying Random Forest, we performed multivariate analysis to assess the combined performance of important predictors of RP.
On univariate analysis, lung V20, lung mean, lung V10 and lung V5 were found to be significant RP predictors with the greatest balance of specificity and sensitivity. On multivariate analysis, Random Forest (AUC = 0.66, p = 0.0005) identified esophagus max (20.5%), lung V20 (16.4%), lung mean (15.7%) and pack-year (14.9%) as the most common primary differentiators of RP.
We highlight Random Forest as an accurate machine learning method to identify known and new predictors of symptomatic RP. Furthermore, this analysis confirms the importance of lung V20, lung mean and pack-year as predictors of RP while also introducing esophagus max as an important RP predictor.
Luna JM
,Chao HH
,Diffenderfer ES
,Valdes G
,Chinniah C
,Ma G
,Cengel KA
,Solberg TD
,Berman AT
,Simone CB 2nd
... -
《-》
-
Enhancing the prediction of IDC breast cancer staging from gene expression profiles using hybrid feature selection methods and deep learning architecture.
Prediction of the stage of cancer plays an important role in planning the course of treatment and has been largely reliant on imaging tools which do not capture molecular events that cause cancer progression. Gene-expression data-based analyses are able to identify these events, allowing RNA-sequence and microarray cancer data to be used for cancer analyses. Breast cancer is the most common cancer worldwide, and is classified into four stages - stages 1, 2, 3, and 4 [2]. While machine learning models have previously been explored to perform stage classification with limited success, multi-class stage classification has not had significant progress. There is a need for improved multi-class classification models, such as by investigating deep learning models. Gene-expression-based cancer data is characterised by the small size of available datasets, class imbalance, and high dimensionality. Class balancing methods must be applied to the dataset. Since all the genes are not necessary for stage prediction, retaining only the necessary genes can improve classification accuracy. The breast cancer samples are to be classified into 4 classes of stages 1 to 4. Invasive ductal carcinoma breast cancer samples are obtained from The Cancer Genome Atlas (TCGA) and Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) datasets and combined. Two class balancing techniques are explored, synthetic minority oversampling technique (SMOTE) and SMOTE followed by random undersampling. A hybrid feature selection pipeline is proposed, with three pipelines explored involving combinations of filter and embedded feature selection methods: Pipeline 1 - minimum-redundancy maximum-relevancy (mRMR) and correlation feature selection (CFS), Pipeline 2 - mRMR, mutual information (MI) and CFS, and Pipeline 3 - mRMR and support vector machine-recursive feature elimination (SVM-RFE). The classification is done using deep learning models, namely deep neural network, convolutional neural network, recurrent neural network, a modified deep neural network, and an AutoKeras generated model. Classification performance post class-balancing and various feature selection techniques show marked improvement over classification prior to feature selection. The best multiclass classification was found to be by a deep neural network post SMOTE and random undersampling, and feature selection using mRMR and recursive feature elimination, with a Cohen-Kappa score of 0.303 and a classification accuracy of 53.1%. For binary classification into early and late-stage cancer, the best performance is obtained by a modified deep neural network (DNN) post SMOTE and random undersampling, and feature selection using mRMR and recursive feature elimination, with an accuracy of 81.0% and a Cohen-Kappa score (CKS) of 0.280. This pipeline also showed improved multiclass classification performance on neuroblastoma cancer data, with a best area under the receiver operating characteristic (auROC) curve score of 0.872, as compared to 0.71 obtained in previous work, an improvement of 22.81%. The results and analysis reveal that feature selection techniques play a vital role in gene-expression data-based classification, and the proposed hybrid feature selection pipeline improves classification performance. Multi-class classification is possible using deep learning models, though further improvement particularly in late-stage classification is necessary and should be explored further.
Kishore A
,Venkataramana L
,Prasad DVV
,Mohan A
,Jha B
... -
《-》