Artificial intelligence-based automated determination in breast and colon cancer and distinction between atypical and typical mitosis using a cloud-based platform.
Artificial intelligence (AI) technology in pathology has been utilized in many areas and requires supervised machine learning. Notably, the annotations that define the ground truth for the identification of different confusing process pathologies, vary from study to study. In this study, we present our findings in the detection of invasive breast cancer for the IHC/ISH assessment system, along with the automated analysis of each tissue layer, cancer type, etc. in colorectal specimens. Additionally, models for the detection of atypical and typical mitosis in several organs were developed using existing whole-slide image (WSI) sets from other AI projects. All H&E slides were scanned by different scanners with a resolution of 0.12-0.50 μm/pixel, and then uploaded to a cloud-based AI platform. Convolutional neural networks (CNN) training sets consisted of invasive carcinoma, atypical and typical mitosis, and colonic tissue elements (mucosa-epithelium, lamina propria, muscularis mucosa, submucosa, muscularis propria, subserosa, vessels, and lymph nodes). In total, 59 WSIs from 59 breast cases, 217 WSIs from 54 colon cases, and 28 WSIs from 23 different types of tumor cases with relatively higher amounts of mitosis were annotated for the training. The harmonic average of precision and sensitivity was scored as F1 by AI. The final AI models of the Breast Project showed an F1 score of 94.49% for Invasive carcinoma. The mitosis project showed F1 scores of 80.18%, 97.40%, and 97.68% for mitosis, atypical, and typical mitosis layers, respectively. Overall F1 scores for the current results of the colon project were 90.02% for invasive carcinoma, 94.81% for the submucosa layer, and 98.02% for vessels and lymph nodes. After the training and optimization of the AI models and validation of each model, external validators evaluated the results of the AI models via blind-reader tasks. The AI models developed in this study were able to identify tumor foci, distinguish in situ areas, define colonic layers, detect vessels and lymph nodes, and catch the difference between atypical and typical mitosis. All results were exported for integration into our in-house applications for breast cancer and AI model development for both whole-block and whole-slide image-based 3D imaging assessment.
Bakoglu N
,Cesmecioglu E
,Sakamoto H
,Yoshida M
,Ohnishi T
,Lee SY
,Smith L
,Yagi Y
... -
《-》
Utilization of an artificial intelligence-enhanced, web-based application to review bile duct brushing cytologic specimens: A pilot study.
The authors previously developed an artificial intelligence (AI) to assist cytologists in the evaluation of digital whole-slide images (WSIs) generated from bile duct brushing specimens. The aim of this trial was to assess the efficiency and accuracy of cytologists using a novel application with this AI tool.
Consecutive bile duct brushing WSIs from indeterminate strictures were obtained. A multidisciplinary panel reviewed all relevant information and provided a central interpretation for each WSI as being "positive," "negative," or "indeterminate." The WSIs were then uploaded to the AI application. The AI scored each WSI as positive or negative for malignancy (i.e., computer-aided diagnosis [CADx]). For each WSI, the AI prioritized cytologic tiles by the likelihood that malignant material was present in the tile. Via the AI, blinded cytologists reviewed all WSIs and provided interpretations (i.e., computer-aided detection [CADe]). The diagnostic accuracies of the WSI evaluation via CADx, CADe, and the original clinical cytologic interpretation (official cytologic interpretation [OCI]) were compared.
Of the 84 WSIs, 15 were positive, 42 were negative, and 27 were indeterminate after central review. The WSIs generated on average 141,950 tiles each. Cytologists using the AI evaluated 10.5 tiles per WSI before making an interpretation. Additionally, cytologists required an average of 84.1 s of total WSI evaluation. WSI interpretation accuracies for CADx (0.754; 95% CI, 0.622-0.859), CADe (0.807; 95% CI, 0.750-0.856), and OCI (0.807; 95% CI, 0.671-0.900) were similar.
This trial demonstrates that an AI application allows cytologists to perform a triaged review of WSIs while maintaining accuracy.
Marya NB
,Powers PD
,Bois MC
,Hartley C
,Kerr SE
,Thangaiah JJ
,Norton D
,Abu Dayyeh BK
,Cantley R
,Chandrasekhara V
,Gores G
,Gleeson FC
,Law RJ
,Maleki Z
,Martin JA
,Pantanowitz L
,Petersen B
,Storm AC
,Levy MJ
,Graham RP
... -
《-》
Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.
Survival estimation for patients with symptomatic skeletal metastases ideally should be made before a type of local treatment has already been determined. Currently available survival prediction tools, however, were generated using data from patients treated either operatively or with local radiation alone, raising concerns about whether they would generalize well to all patients presenting for assessment. The Skeletal Oncology Research Group machine-learning algorithm (SORG-MLA), trained with institution-based data of surgically treated patients, and the Metastases location, Elderly, Tumor primary, Sex, Sickness/comorbidity, and Site of radiotherapy model (METSSS), trained with registry-based data of patients treated with radiotherapy alone, are two of the most recently developed survival prediction models, but they have not been tested on patients whose local treatment strategy is not yet decided.
(1) Which of these two survival prediction models performed better in a mixed cohort made up both of patients who received local treatment with surgery followed by radiotherapy and who had radiation alone for symptomatic bone metastases? (2) Which model performed better among patients whose local treatment consisted of only palliative radiotherapy? (3) Are laboratory values used by SORG-MLA, which are not included in METSSS, independently associated with survival after controlling for predictions made by METSSS?
Between 2010 and 2018, we provided local treatment for 2113 adult patients with skeletal metastases in the extremities at an urban tertiary referral academic medical center using one of two strategies: (1) surgery followed by postoperative radiotherapy or (2) palliative radiotherapy alone. Every patient's survivorship status was ascertained either by their medical records or the national death registry from the Taiwanese National Health Insurance Administration. After applying a priori designated exclusion criteria, 91% (1920) were analyzed here. Among them, 48% (920) of the patients were female, and the median (IQR) age was 62 years (53 to 70 years). Lung was the most common primary tumor site (41% [782]), and 59% (1128) of patients had other skeletal metastases in addition to the treated lesion(s). In general, the indications for surgery were the presence of a complete pathologic fracture or an impending pathologic fracture, defined as having a Mirels score of ≥ 9, in patients with an American Society of Anesthesiologists (ASA) classification of less than or equal to IV and who were considered fit for surgery. The indications for radiotherapy were relief of pain, local tumor control, prevention of skeletal-related events, and any combination of the above. In all, 84% (1610) of the patients received palliative radiotherapy alone as local treatment for the target lesion(s), and 16% (310) underwent surgery followed by postoperative radiotherapy. Neither METSSS nor SORG-MLA was used at the point of care to aid clinical decision-making during the treatment period. Survival was retrospectively estimated by these two models to test their potential for providing survival probabilities. We first compared SORG to METSSS in the entire population. Then, we repeated the comparison in patients who received local treatment with palliative radiation alone. We assessed model performance by area under the receiver operating characteristic curve (AUROC), calibration analysis, Brier score, and decision curve analysis (DCA). The AUROC measures discrimination, which is the ability to distinguish patients with the event of interest (such as death at a particular time point) from those without. AUROC typically ranges from 0.5 to 1.0, with 0.5 indicating random guessing and 1.0 a perfect prediction, and in general, an AUROC of ≥ 0.7 indicates adequate discrimination for clinical use. Calibration refers to the agreement between the predicted outcomes (in this case, survival probabilities) and the actual outcomes, with a perfect calibration curve having an intercept of 0 and a slope of 1. A positive intercept indicates that the actual survival is generally underestimated by the prediction model, and a negative intercept suggests the opposite (overestimation). When comparing models, an intercept closer to 0 typically indicates better calibration. Calibration can also be summarized as log(O:E), the logarithm scale of the ratio of observed (O) to expected (E) survivors. A log(O:E) > 0 signals an underestimation (the observed survival is greater than the predicted survival); and a log(O:E) < 0 indicates the opposite (the observed survival is lower than the predicted survival). A model with a log(O:E) closer to 0 is generally considered better calibrated. The Brier score is the mean squared difference between the model predictions and the observed outcomes, and it ranges from 0 (best prediction) to 1 (worst prediction). The Brier score captures both discrimination and calibration, and it is considered a measure of overall model performance. In Brier score analysis, the "null model" assigns a predicted probability equal to the prevalence of the outcome and represents a model that adds no new information. A prediction model should achieve a Brier score at least lower than the null-model Brier score to be considered as useful. The DCA was developed as a method to determine whether using a model to inform treatment decisions would do more good than harm. It plots the net benefit of making decisions based on the model's predictions across all possible risk thresholds (or cost-to-benefit ratios) in relation to the two default strategies of treating all or no patients. The care provider can decide on an acceptable risk threshold for the proposed treatment in an individual and assess the corresponding net benefit to determine whether consulting with the model is superior to adopting the default strategies. Finally, we examined whether laboratory data, which were not included in the METSSS model, would have been independently associated with survival after controlling for the METSSS model's predictions by using the multivariable logistic and Cox proportional hazards regression analyses.
Between the two models, only SORG-MLA achieved adequate discrimination (an AUROC of > 0.7) in the entire cohort (of patients treated operatively or with radiation alone) and in the subgroup of patients treated with palliative radiotherapy alone. SORG-MLA outperformed METSSS by a wide margin on discrimination, calibration, and Brier score analyses in not only the entire cohort but also the subgroup of patients whose local treatment consisted of radiotherapy alone. In both the entire cohort and the subgroup, DCA demonstrated that SORG-MLA provided more net benefit compared with the two default strategies (of treating all or no patients) and compared with METSSS when risk thresholds ranged from 0.2 to 0.9 at both 90 days and 1 year, indicating that using SORG-MLA as a decision-making aid was beneficial when a patient's individualized risk threshold for opting for treatment was 0.2 to 0.9. Higher albumin, lower alkaline phosphatase, lower calcium, higher hemoglobin, lower international normalized ratio, higher lymphocytes, lower neutrophils, lower neutrophil-to-lymphocyte ratio, lower platelet-to-lymphocyte ratio, higher sodium, and lower white blood cells were independently associated with better 1-year and overall survival after adjusting for the predictions made by METSSS.
Based on these discoveries, clinicians might choose to consult SORG-MLA instead of METSSS for survival estimation in patients with long-bone metastases presenting for evaluation of local treatment. Basing a treatment decision on the predictions of SORG-MLA could be beneficial when a patient's individualized risk threshold for opting to undergo a particular treatment strategy ranged from 0.2 to 0.9. Future studies might investigate relevant laboratory items when constructing or refining a survival estimation model because these data demonstrated prognostic value independent of the predictions of the METSSS model, and future studies might also seek to keep these models up to date using data from diverse, contemporary patients undergoing both modern operative and nonoperative treatments.
Level III, diagnostic study.
Lee CC
,Chen CW
,Yen HK
,Lin YP
,Lai CY
,Wang JL
,Groot OQ
,Janssen SJ
,Schwab JH
,Hsu FM
,Lin WH
... -
《-》
On the objectivity, reliability, and validity of deep learning enabled bioimage analyses.
Bioimage analysis of fluorescent labels is widely used in the life sciences. Recent advances in deep learning (DL) allow automating time-consuming manual image analysis processes based on annotated training data. However, manual annotation of fluorescent features with a low signal-to-noise ratio is somewhat subjective. Training DL models on subjective annotations may be instable or yield biased models. In turn, these models may be unable to reliably detect biological effects. An analysis pipeline integrating data annotation, ground truth estimation, and model training can mitigate this risk. To evaluate this integrated process, we compared different DL-based analysis approaches. With data from two model organisms (mice, zebrafish) and five laboratories, we show that ground truth estimation from multiple human annotators helps to establish objectivity in fluorescent feature annotations. Furthermore, ensembles of multiple models trained on the estimated ground truth establish reliability and validity. Our research provides guidelines for reproducible DL-based bioimage analyses.
Segebarth D
,Griebel M
,Stein N
,von Collenberg CR
,Martin C
,Fiedler D
,Comeras LB
,Sah A
,Schoeffler V
,Lüffe T
,Dürr A
,Gupta R
,Sasi M
,Lillesaar C
,Lange MD
,Tasan RO
,Singewald N
,Pape HC
,Flath CM
,Blum R
... -
《eLife》