-
The structural similarity index for IMRT quality assurance: radiomics-based error classification.
The implementation of radiomics and machine learning (ML) techniques on analyzing two-dimensional gamma maps has been demonstrated superior to the conventional gamma analysis for error identification in intensity modulated radiotherapy (IMRT) quality assurance (QA). Recently, the Structural SIMilarity (SSIM) sub-index maps were shown to be able to reveal the error types of the dose distributions. In this study, we aimed to apply radiomics analysis on SSIM sub-index maps and develop ML models to classify delivery errors in patient-specific dynamic IMRT QA.
Twenty-one sliding-window IMRT plans of 180 beams for three treatment sites were involved in this study. Four types of machine-related errors of various magnitudes were simulated for each beam at each control point, including the monitor unit (MU) variations, same-directional and opposite-directional shifts of the multileaf collimators (MLCs) and random mispositioning of the MLCs. In the QA process, a total of 1620 portal dose (PD) images were acquired for the beams with and without errors. The predicted PD images of the original beams were set as references. To quantify the agreement between a measured PD image and the corresponding predicted PD image, four difference maps including three SSIM sub-index maps, and one dose difference-derived map were calculated. Then, radiomic features were extracted from the four difference maps of each measured PD image. We tested four typical classifiers including linear discriminant classifier (LDC), two supporting vector machine (SVM) classifiers, and random forest (RF) for this multiclass classification task. A nested cross-validation scheme was used for model evaluations, where the SVM recursive feature elimination method was applied for feature selection. Finally, the performance of the ML model on identifying the error-free and the erroneous cases was compared to that of the conventional gamma analysis.
The statistics of the selected features showed that all of the difference maps and the feature categories made balanced contributions to solve this classification task. Best performance was achieved by the Linear-SVM model with average overall classification accuracy of 0.86. Specifically, the average classification accuracies of the shift, opening, and the random errors were around 0.9. Moreover, ~80% of error-free and MU errors were correctly classified. Using gamma analysis, the 3 mm/3% criterion was found insensitive to errors (sensitivity was only 0.33). Although the sensitivity to errors with the 2 mm/2% criterion increased to 0.79, still 8% worse than that of the ML model.
We proposed an ML-based method for machine-related error identification in patient-specific dynamic IMRT QA, where radiomic analysis on SSIM sub-index maps were used for feature extraction. With extensive validation to select the best features and classifiers, high accuracies in error classification were achieved. Compared with the conventional gamma threshold method, this approach has great potential in error identification for the patient-specific IMRT QA process.
Ma C
,Wang R
,Zhou S
,Wang M
,Yue H
,Zhang Y
,Wu H
... -
《-》
-
Detecting MLC modeling errors using radiomics-based machine learning in patient-specific QA with an EPID for intensity-modulated radiation therapy.
We sought to develop machine learning models to detect multileaf collimator (MLC) modeling errors with the use of radiomic features of fluence maps measured in patient-specific quality assurance (QA) for intensity-modulated radiation therapy (IMRT) with an electric portal imaging device (EPID).
Fluence maps measured with EPID for 38 beams from 19 clinical IMRT plans were assessed. Plans with various degrees of error in MLC modeling parameters [i.e., MLC transmission factor (TF) and dosimetric leaf gap (DLG)] and plans with an MLC positional error for comparison were created. For a total of 152 error plans for each type of error, we calculated fluence difference maps for each beam by subtracting the calculated maps from the measured maps. A total of 837 radiomic features were extracted from each fluence difference map, and we determined the number of features used for the training dataset in the machine learning models by using random forest regression. Machine learning models using the five typical algorithms [decision tree, k-nearest neighbor (kNN), support vector machine (SVM), logistic regression, and random forest] for binary classification between the error-free plan and the plan with the corresponding error for each type of error were developed. We used part of the total dataset to perform fourfold cross-validation to tune the models, and we used the remaining test dataset to evaluate the performance of the developed models. A gamma analysis was also performed between the measured and calculated fluence maps with the criteria of 3%/2 and 2%/2 mm for all of the types of error.
The radiomic features and its optimal number were similar for the models for the TF and the DLG error detection, which was different from the MLC positional error. The highest sensitivity was obtained as 0.913 for the TF error with SVM and logistic regression, 0.978 for the DLG error with kNN and SVM, and 1.000 for the MLC positional error with kNN, SVM, and random forest. The highest specificity was obtained as 1.000 for the TF error with a decision tree, SVM, and logistic regression, 1.000 for the DLG error with a decision tree, logistic regression, and random forest, and 0.909 for the MLC positional error with a decision tree and logistic regression. The gamma analysis showed the poorest performance in which sensitivities were 0.737 for the TF error and the DLG error and 0.882 for the MLC positional error for 3%/2 mm. The addition of another type of error to fluence maps significantly reduced the sensitivity for the TF and the DLG error, whereas no effect was observed for the MLC positional error detection.
Compared to the conventional gamma analysis, the radiomics-based machine learning models showed higher sensitivity and specificity in detecting a single type of the MLC modeling error and the MLC positional error. Although the developed models need further improvement for detecting multiple types of error, radiomics-based IMRT QA was shown to be a promising approach for detecting the MLC modeling error.
Sakai M
,Nakano H
,Kawahara D
,Tanabe S
,Takizawa T
,Narita A
,Yamada T
,Sakai H
,Ueda M
,Sasamoto R
,Kaidu M
,Aoyama H
,Ishikawa H
,Utsunomiya S
... -
《-》
-
Deep learning for patient-specific quality assurance: Identifying errors in radiotherapy delivery by radiomic analysis of gamma images with convolutional neural networks.
Patient-specific quality assurance (QA) for intensity-modulated radiation therapy (IMRT) is a ubiquitous clinical procedure, but conventional methods have often been criticized as being insensitive to errors or less effective than other common physics checks. Recently, there has been interest in the application of radiomics, quantitative extraction of image features, to radiotherapy QA. In this work, we investigate a deep learning approach to classify the presence or absence of introduced radiotherapy treatment delivery errors from patient-specific QA.
Planar dose maps from 186 IMRT beams from 23 IMRT plans were evaluated. Each plan was transferred to a cylindrical phantom CT geometry. Three sets of planar doses were exported from each plan corresponding to (a) the error-free case, (b) a random multileaf collimator (MLC) error case, and (c) a systematic MLC error case. Each plan was delivered to the electronic portal imaging device (EPID), and planned and measured doses were used to calculate gamma images in an EPID dosimetry software package (for a total of 558 gamma images). Two radiomic approaches were used. In the first, a convolutional neural network with triplet learning was used to extract image features from the gamma images. In the second, a handcrafted approach using texture features was used. The resulting metrics from both approaches were input into four machine learning classifiers (support vector machines, multilayer perceptrons, decision trees, and k-nearest-neighbors) in order to determine whether images contained the introduced errors. Two experiments were considered: the two-class experiment classified images as error-free or containing any MLC error, and the three-class experiment classified images as error-free, containing a random MLC error, or containing a systematic MLC error. Additionally, threshold-based passing criteria were calculated for comparison.
In total, 303 gamma images were used for model training and 255 images were used for model testing. The highest classification accuracy was achieved with the deep learning approach, with a maximum accuracy of 77.3% in the two-class experiment and 64.3% in the three-class experiment. The performance of the handcrafted approach with texture features was lower, with a maximum accuracy of 66.3% in the two-class experiment and 53.7% in the three-class experiment. Variability between the results of the four machine learning classifiers was lower for the deep learning approach vs the texture feature approach. Both radiomic approaches were superior to threshold-based passing criteria.
Deep learning with convolutional neural networks can be used to classify the presence or absence of introduced radiotherapy treatment delivery errors from patient-specific gamma images. The performance of the deep learning network was superior to a handcrafted approach with texture features, and both radiomic approaches were better than threshold-based passing criteria. The results suggest that radiomic QA is a promising direction for clinical radiotherapy.
Nyflot MJ
,Thammasorn P
,Wootton LS
,Ford EC
,Chaovalitwongse WA
... -
《-》
-
Error detection model developed using a multi-task convolutional neural network in patient-specific quality assurance for volumetric-modulated arc therapy.
In patient-specific quality assurance (QA) for static beam intensity-modulated radiation therapy (IMRT), machine-learning-based dose analysis methods have been developed to identify the cause of an error as an alternative to gamma analysis. Although these new methods have revealed that the cause of the error can be identified by analyzing the dose distribution obtained from the two-dimensional detector, they have not been extended to the analysis of volumetric-modulated arc therapy (VMAT) QA. In this study, we propose a deep learning approach to detect various types of errors in patient-specific VMAT QA.
A total of 161 beams from 104 prostate VMAT plans were analyzed. All beams were measured using a cylindrical detector (Delta4; ScandiDos, Uppsala, Sweden), and predicted dose distributions in a cylindrical phantom were calculated using a treatment planning system (TPS). In addition to the error-free plan, we simulated 12 types of errors: two types of multileaf collimator positional errors (systematic or random leaf offset of 2 mm), two types of monitor unit (MU) scaling errors (±3%), two types of gantry rotation errors (±2° in clockwise and counterclockwise direction), and six types of phantom setup errors (±1 mm in lateral, longitudinal, and vertical directions). The error-introduced predicted dose distributions were created by editing the calculated dose distributions using a TPS with in-house software. Those 13 types of dose difference maps, consisting of an error-free map and 12 error maps, were created from the measured and predicted dose distributions and were used to train the convolutional neural network (CNN) model. Our model was a multi-task model that individually detected each of the 12 types of errors. Two datasets, Test sets 1 and 2, were prepared to evaluate the performance of the model. Test set 1 consisted of 13 types of dose maps used for training, whereas Test set 2 included the dose maps with 25 types of errors in addition to the error-free dose map. The dose map, which introduced 25 types of errors, was generated by combining two of the 12 types of simulated errors. For comparison with the performance of our model, gamma analysis was performed for Test sets 1 and 2 with the criteria set to 3%/2 mm and 2%/1 mm (dose difference/distance to agreement).
For Test set 1, the overall accuracy of our CNN model, gamma analysis with the criteria set to 3%/2 mm, and gamma analysis with the criteria set to 2%/1 mm was 0.92, 0.19, and 0.81, respectively. Similarly, for Test set 2, the overall accuracy was 0.44, 0.42, and 0.95, respectively. Our model outperformed gamma analysis in the classification of dose maps containing a single type error, and the performance of our model was inferior in the classification of dose maps containing compound errors.
A multi-task CNN model for detecting errors in patient-specific VMAT QA using a cylindrical measuring device was constructed, and its performance was evaluated. Our results demonstrate that our model was effective in identifying the error type in the dose map for VMAT QA.
Kimura Y
,Kadoya N
,Oku Y
,Kajikawa T
,Tomori S
,Jingu K
... -
《-》
-
Error detection and classification in patient-specific IMRT QA with dual neural networks.
Despite being the standard metric in patient-specific quality assurance (QA) for intensity-modulated radiotherapy (IMRT), gamma analysis has two shortcomings: (a) it lacks sensitivity to small but clinically relevant errors (b) it does not provide efficient means to classify the error sources. The purpose of this work is to propose a dual neural network method to achieve simultaneous error detection and classification in patient-specific IMRT QA.
For a pair of dose distributions, we extracted the dose difference histogram (DDH) for the low dose gradient region and two signed distance-to-agreement (sDTA) maps (one in x direction and one in y direction) for the high dose gradient region. An artificial neural network (ANN) and a convolutional neural network (CNN) were designed to analyze the DDH and the two sDTA maps, respectively. The ANN was trained to detect and classify six classes of dosimetric errors: incorrect multileaf collimator (MLC) transmission (±1%) and four types of monitor unit (MU) scaling errors (±1% and ±2%). The CNN was trained to detect and classify seven classes of spatial errors: incorrect effective source size, 1 mm MLC leaf bank overtravel or undertravel, 2 mm single MLC leaf overtravel or undertravel, and device misalignment errors (1 mm in x- or y direction). An in-house planar dose calculation software was used to simulate measurements with errors and noise introduced. Both networks were trained and validated with 13 IMRT plans (totaling 88 fields). A fivefold cross-validation technique was used to evaluate their accuracy.
Distinct features were found in the DDH and the sDTA maps. The ANN perfectly identified all four types of MU scaling errors and the specific accuracies for the classes of no error, MLC transmission increase, MLC transmission decrease were 98.9%, 96.6%, and 94.3%, respectively. For the CNN, the largest confusion occurred between the 1-mm-MLC bank overtravel class and the 1-mm-device alignment error in x-direction class, which brought the specific accuracies down to 90.9% and 92.0%, respectively. The specific accuracy for the 2-mm-single MLC leaf undertravel class was 93.2% as it misclassified 5.7% of the class as being error free (false negative). Otherwise, the specific accuracy was above 95%. The overall accuracies across the fivefold were 98.3 ± 0.7% and 95.6% ± 1.5% for the ANN and the CNN, respectively.
Both the DDH and the sDTA maps are suitable features for error classification in IMRT QA. The proposed dual neural network method achieved simultaneous error detection and classification with excellent accuracy. It could be used in complement with the gamma analysis to potentially shift the IMRT QA paradigm from passive pass/fail analysis to active error detection and root cause identification.
Potter NJ
,Mund K
,Andreozzi JM
,Li JG
,Liu C
,Yan G
... -
《-》