-
Deep learning for patient-specific quality assurance: Identifying errors in radiotherapy delivery by radiomic analysis of gamma images with convolutional neural networks.
Patient-specific quality assurance (QA) for intensity-modulated radiation therapy (IMRT) is a ubiquitous clinical procedure, but conventional methods have often been criticized as being insensitive to errors or less effective than other common physics checks. Recently, there has been interest in the application of radiomics, quantitative extraction of image features, to radiotherapy QA. In this work, we investigate a deep learning approach to classify the presence or absence of introduced radiotherapy treatment delivery errors from patient-specific QA.
Planar dose maps from 186 IMRT beams from 23 IMRT plans were evaluated. Each plan was transferred to a cylindrical phantom CT geometry. Three sets of planar doses were exported from each plan corresponding to (a) the error-free case, (b) a random multileaf collimator (MLC) error case, and (c) a systematic MLC error case. Each plan was delivered to the electronic portal imaging device (EPID), and planned and measured doses were used to calculate gamma images in an EPID dosimetry software package (for a total of 558 gamma images). Two radiomic approaches were used. In the first, a convolutional neural network with triplet learning was used to extract image features from the gamma images. In the second, a handcrafted approach using texture features was used. The resulting metrics from both approaches were input into four machine learning classifiers (support vector machines, multilayer perceptrons, decision trees, and k-nearest-neighbors) in order to determine whether images contained the introduced errors. Two experiments were considered: the two-class experiment classified images as error-free or containing any MLC error, and the three-class experiment classified images as error-free, containing a random MLC error, or containing a systematic MLC error. Additionally, threshold-based passing criteria were calculated for comparison.
In total, 303 gamma images were used for model training and 255 images were used for model testing. The highest classification accuracy was achieved with the deep learning approach, with a maximum accuracy of 77.3% in the two-class experiment and 64.3% in the three-class experiment. The performance of the handcrafted approach with texture features was lower, with a maximum accuracy of 66.3% in the two-class experiment and 53.7% in the three-class experiment. Variability between the results of the four machine learning classifiers was lower for the deep learning approach vs the texture feature approach. Both radiomic approaches were superior to threshold-based passing criteria.
Deep learning with convolutional neural networks can be used to classify the presence or absence of introduced radiotherapy treatment delivery errors from patient-specific gamma images. The performance of the deep learning network was superior to a handcrafted approach with texture features, and both radiomic approaches were better than threshold-based passing criteria. The results suggest that radiomic QA is a promising direction for clinical radiotherapy.
Nyflot MJ
,Thammasorn P
,Wootton LS
,Ford EC
,Chaovalitwongse WA
... -
《-》
-
Detecting MLC modeling errors using radiomics-based machine learning in patient-specific QA with an EPID for intensity-modulated radiation therapy.
We sought to develop machine learning models to detect multileaf collimator (MLC) modeling errors with the use of radiomic features of fluence maps measured in patient-specific quality assurance (QA) for intensity-modulated radiation therapy (IMRT) with an electric portal imaging device (EPID).
Fluence maps measured with EPID for 38 beams from 19 clinical IMRT plans were assessed. Plans with various degrees of error in MLC modeling parameters [i.e., MLC transmission factor (TF) and dosimetric leaf gap (DLG)] and plans with an MLC positional error for comparison were created. For a total of 152 error plans for each type of error, we calculated fluence difference maps for each beam by subtracting the calculated maps from the measured maps. A total of 837 radiomic features were extracted from each fluence difference map, and we determined the number of features used for the training dataset in the machine learning models by using random forest regression. Machine learning models using the five typical algorithms [decision tree, k-nearest neighbor (kNN), support vector machine (SVM), logistic regression, and random forest] for binary classification between the error-free plan and the plan with the corresponding error for each type of error were developed. We used part of the total dataset to perform fourfold cross-validation to tune the models, and we used the remaining test dataset to evaluate the performance of the developed models. A gamma analysis was also performed between the measured and calculated fluence maps with the criteria of 3%/2 and 2%/2 mm for all of the types of error.
The radiomic features and its optimal number were similar for the models for the TF and the DLG error detection, which was different from the MLC positional error. The highest sensitivity was obtained as 0.913 for the TF error with SVM and logistic regression, 0.978 for the DLG error with kNN and SVM, and 1.000 for the MLC positional error with kNN, SVM, and random forest. The highest specificity was obtained as 1.000 for the TF error with a decision tree, SVM, and logistic regression, 1.000 for the DLG error with a decision tree, logistic regression, and random forest, and 0.909 for the MLC positional error with a decision tree and logistic regression. The gamma analysis showed the poorest performance in which sensitivities were 0.737 for the TF error and the DLG error and 0.882 for the MLC positional error for 3%/2 mm. The addition of another type of error to fluence maps significantly reduced the sensitivity for the TF and the DLG error, whereas no effect was observed for the MLC positional error detection.
Compared to the conventional gamma analysis, the radiomics-based machine learning models showed higher sensitivity and specificity in detecting a single type of the MLC modeling error and the MLC positional error. Although the developed models need further improvement for detecting multiple types of error, radiomics-based IMRT QA was shown to be a promising approach for detecting the MLC modeling error.
Sakai M
,Nakano H
,Kawahara D
,Tanabe S
,Takizawa T
,Narita A
,Yamada T
,Sakai H
,Ueda M
,Sasamoto R
,Kaidu M
,Aoyama H
,Ishikawa H
,Utsunomiya S
... -
《-》
-
The structural similarity index for IMRT quality assurance: radiomics-based error classification.
The implementation of radiomics and machine learning (ML) techniques on analyzing two-dimensional gamma maps has been demonstrated superior to the conventional gamma analysis for error identification in intensity modulated radiotherapy (IMRT) quality assurance (QA). Recently, the Structural SIMilarity (SSIM) sub-index maps were shown to be able to reveal the error types of the dose distributions. In this study, we aimed to apply radiomics analysis on SSIM sub-index maps and develop ML models to classify delivery errors in patient-specific dynamic IMRT QA.
Twenty-one sliding-window IMRT plans of 180 beams for three treatment sites were involved in this study. Four types of machine-related errors of various magnitudes were simulated for each beam at each control point, including the monitor unit (MU) variations, same-directional and opposite-directional shifts of the multileaf collimators (MLCs) and random mispositioning of the MLCs. In the QA process, a total of 1620 portal dose (PD) images were acquired for the beams with and without errors. The predicted PD images of the original beams were set as references. To quantify the agreement between a measured PD image and the corresponding predicted PD image, four difference maps including three SSIM sub-index maps, and one dose difference-derived map were calculated. Then, radiomic features were extracted from the four difference maps of each measured PD image. We tested four typical classifiers including linear discriminant classifier (LDC), two supporting vector machine (SVM) classifiers, and random forest (RF) for this multiclass classification task. A nested cross-validation scheme was used for model evaluations, where the SVM recursive feature elimination method was applied for feature selection. Finally, the performance of the ML model on identifying the error-free and the erroneous cases was compared to that of the conventional gamma analysis.
The statistics of the selected features showed that all of the difference maps and the feature categories made balanced contributions to solve this classification task. Best performance was achieved by the Linear-SVM model with average overall classification accuracy of 0.86. Specifically, the average classification accuracies of the shift, opening, and the random errors were around 0.9. Moreover, ~80% of error-free and MU errors were correctly classified. Using gamma analysis, the 3 mm/3% criterion was found insensitive to errors (sensitivity was only 0.33). Although the sensitivity to errors with the 2 mm/2% criterion increased to 0.79, still 8% worse than that of the ML model.
We proposed an ML-based method for machine-related error identification in patient-specific dynamic IMRT QA, where radiomic analysis on SSIM sub-index maps were used for feature extraction. With extensive validation to select the best features and classifiers, high accuracies in error classification were achieved. Compared with the conventional gamma threshold method, this approach has great potential in error identification for the patient-specific IMRT QA process.
Ma C
,Wang R
,Zhou S
,Wang M
,Yue H
,Zhang Y
,Wu H
... -
《-》
-
Error detection model developed using a multi-task convolutional neural network in patient-specific quality assurance for volumetric-modulated arc therapy.
In patient-specific quality assurance (QA) for static beam intensity-modulated radiation therapy (IMRT), machine-learning-based dose analysis methods have been developed to identify the cause of an error as an alternative to gamma analysis. Although these new methods have revealed that the cause of the error can be identified by analyzing the dose distribution obtained from the two-dimensional detector, they have not been extended to the analysis of volumetric-modulated arc therapy (VMAT) QA. In this study, we propose a deep learning approach to detect various types of errors in patient-specific VMAT QA.
A total of 161 beams from 104 prostate VMAT plans were analyzed. All beams were measured using a cylindrical detector (Delta4; ScandiDos, Uppsala, Sweden), and predicted dose distributions in a cylindrical phantom were calculated using a treatment planning system (TPS). In addition to the error-free plan, we simulated 12 types of errors: two types of multileaf collimator positional errors (systematic or random leaf offset of 2 mm), two types of monitor unit (MU) scaling errors (±3%), two types of gantry rotation errors (±2° in clockwise and counterclockwise direction), and six types of phantom setup errors (±1 mm in lateral, longitudinal, and vertical directions). The error-introduced predicted dose distributions were created by editing the calculated dose distributions using a TPS with in-house software. Those 13 types of dose difference maps, consisting of an error-free map and 12 error maps, were created from the measured and predicted dose distributions and were used to train the convolutional neural network (CNN) model. Our model was a multi-task model that individually detected each of the 12 types of errors. Two datasets, Test sets 1 and 2, were prepared to evaluate the performance of the model. Test set 1 consisted of 13 types of dose maps used for training, whereas Test set 2 included the dose maps with 25 types of errors in addition to the error-free dose map. The dose map, which introduced 25 types of errors, was generated by combining two of the 12 types of simulated errors. For comparison with the performance of our model, gamma analysis was performed for Test sets 1 and 2 with the criteria set to 3%/2 mm and 2%/1 mm (dose difference/distance to agreement).
For Test set 1, the overall accuracy of our CNN model, gamma analysis with the criteria set to 3%/2 mm, and gamma analysis with the criteria set to 2%/1 mm was 0.92, 0.19, and 0.81, respectively. Similarly, for Test set 2, the overall accuracy was 0.44, 0.42, and 0.95, respectively. Our model outperformed gamma analysis in the classification of dose maps containing a single type error, and the performance of our model was inferior in the classification of dose maps containing compound errors.
A multi-task CNN model for detecting errors in patient-specific VMAT QA using a cylindrical measuring device was constructed, and its performance was evaluated. Our results demonstrate that our model was effective in identifying the error type in the dose map for VMAT QA.
Kimura Y
,Kadoya N
,Oku Y
,Kajikawa T
,Tomori S
,Jingu K
... -
《-》
-
Error Detection in Intensity-Modulated Radiation Therapy Quality Assurance Using Radiomic Analysis of Gamma Distributions.
To improve the detection of errors in intensity-modulated radiation therapy (IMRT) with a novel method that uses quantitative image features from radiomics to analyze gamma distributions generated during patient specific quality assurance (QA).
One hundred eighty-six IMRT beams from 23 patient treatments were delivered to a phantom and measured with electronic portal imaging device dosimetry. The treatments spanned a range of anatomic sites; half were head and neck treatments, and the other half were drawn from treatments for lung and rectal cancers, sarcoma, and glioblastoma. Planar gamma distributions, or gamma images, were calculated for each beam using the measured dose and calculated doses from the 3-dimensional treatment planning system under various scenarios: a plan without errors and plans with either simulated random or systematic multileaf collimator mispositioning errors. The gamma images were randomly divided into 2 sets: a training set for model development and testing set for validation. Radiomic features were calculated for each gamma image. Error detection models were developed by training logistic regression models on these radiomic features. The models were applied to the testing set to quantify their predictive utility, determined by calculating the area under the curve (AUC) of the receiver operator characteristic curve, and were compared with traditional threshold-based gamma analysis.
The AUC of the random multileaf collimator mispositioning model on the testing set was 0.761 compared with 0.512 for threshold-based gamma analysis. The AUC for the systematic mispositioning model was 0.717 versus 0.660 for threshold-based gamma analysis. Furthermore, the models could discriminate between the 2 types of errors simulated here, exhibiting AUCs of approximately 0.5 (equivalent to random guessing) when applied to the error they were not designed to detect.
The feasibility of error detection in patient-specific IMRT QA using radiomic analysis of QA images has been demonstrated. This methodology represents a substantial step forward for IMRT QA with improved sensitivity and specificity over current QA methods and the potential to distinguish between different types of errors.
Wootton LS
,Nyflot MJ
,Chaovalitwongse WA
,Ford E
... -
《-》