Deep learning models for thyroid nodules diagnosis of fine-needle aspiration biopsy: a retrospective, prospective, multicentre study in China.
Accurately distinguishing between malignant and benign thyroid nodules through fine-needle aspiration cytopathology is crucial for appropriate therapeutic intervention. However, cytopathologic diagnosis is time consuming and hindered by the shortage of experienced cytopathologists. Reliable assistive tools could improve cytopathologic diagnosis efficiency and accuracy. We aimed to develop and test an artificial intelligence (AI)-assistive system for thyroid cytopathologic diagnosis according to the Thyroid Bethesda Reporting System.
11 254 whole-slide images (WSIs) from 4037 patients were used to train deep learning models. Among the selected WSIs, cell level was manually annotated by cytopathologists according to The Bethesda System for Reporting Thyroid Cytopathology (TBSRTC) guidelines of the second edition (2017 version). A retrospective dataset of 5638 WSIs of 2914 patients from four medical centres was used for validation. 469 patients were recruited for the prospective study of the performance of AI models and their 537 thyroid nodule samples were used. Cohorts for training and validation were enrolled between Jan 1, 2016, and Aug 1, 2022, and the prospective dataset was recruited between Aug 1, 2022, and Jan 1, 2023. The performance of our AI models was estimated as the area under the receiver operating characteristic (AUROC), sensitivity, specificity, accuracy, positive predictive value, and negative predictive value. The primary outcomes were the prediction sensitivity and specificity of the model to assist cyto-diagnosis of thyroid nodules.
The AUROC of TBSRTC III+ (which distinguishes benign from TBSRTC classes III, IV, V, and VI) was 0·930 (95% CI 0·921-0·939) for Sun Yat-sen Memorial Hospital of Sun Yat-sen University (SYSMH) internal validation and 0·944 (0·929 - 0·959), 0·939 (0·924-0·955), 0·971 (0·938-1·000) for The First People's Hospital of Foshan (FPHF), Sichuan Cancer Hospital & Institute (SCHI), and The Third Affiliated Hospital of Guangzhou Medical University (TAHGMU) medical centres, respectively. The AUROC of TBSRTC V+ (which distinguishes benign from TBSRTC classes V and VI) was 0·990 (95% CI 0·986-0·995) for SYSMH internal validation and 0·988 (0·980-0·995), 0·965 (0·953-0·977), and 0·991 (0·972-1·000) for FPHF, SCHI, and TAHGMU medical centres, respectively. For the prospective study at SYSMH, the AUROC of TBSRTC III+ and TBSRTC V+ was 0·977 and 0·981, respectively. With the assistance of AI, the specificity of junior cytopathologists was boosted from 0·887 (95% CI 0·8440-0·922) to 0·993 (0·974-0·999) and the accuracy was improved from 0·877 (0·846-0·904) to 0·948 (0·926-0·965). 186 atypia of undetermined significance samples from 186 patients with BRAF mutation information were collected; 43 of them harbour the BRAFV600E mutation. 91% (39/43) of BRAFV600E-positive atypia of undetermined significance samples were identified as malignant by the AI models.
In this study, we developed an AI-assisted model named the Thyroid Patch-Oriented WSI Ensemble Recognition (ThyroPower) system, which facilitates rapid and robust cyto-diagnosis of thyroid nodules, potentially enhancing the diagnostic capabilities of cytopathologists. Moreover, it serves as a potential solution to mitigate the scarcity of cytopathologists.
Guangdong Science and Technology Department.
For the Chinese translation of the abstract see Supplementary Materials section.
Wang J
,Zheng N
,Wan H
,Yao Q
,Jia S
,Zhang X
,Fu S
,Ruan J
,He G
,Chen X
,Li S
,Chen R
,Lai B
,Wang J
,Jiang Q
,Ouyang N
,Zhang Y
... -
《The Lancet Digital Health》
US of thyroid nodules: can AI-assisted diagnostic system compete with fine needle aspiration?
Artificial intelligence (AI) systems can diagnose thyroid nodules with similar or better performance than radiologists. Little is known about how this performance compares with that achieved through fine needle aspiration (FNA). This study aims to compare the diagnostic yields of FNA cytopathology alone and combined with BRAFV600E mutation analysis and an AI diagnostic system.
The ultrasound images of 637 thyroid nodules were collected in three hospitals. The diagnostic efficacies of an AI diagnostic system, FNA-based cytopathology, and BRAFV600E mutation analysis were evaluated in terms of sensitivity, specificity, accuracy, and the κ coefficient with respect to the gold standard, defined by postsurgical pathology and consistent benign outcomes from two combined FNA and mutation analysis examinations performed with a half-year interval.
The malignancy threshold for the AI system was selected according to the Youden index from a retrospective cohort of 346 nodules and then applied to a prospective cohort of 291 nodules. The combination of FNA cytopathology according to the Bethesda criteria and BRAFV600E mutation analysis showed no significant difference from the AI system in terms of accuracy for either cohort in our multicenter study. In addition, for 45 included indeterminate Bethesda category III and IV nodules, the accuracy, sensitivity, and specificity of the AI system were 84.44%, 95.45%, and 73.91%, respectively.
The AI diagnostic system showed similar diagnostic performance to FNA cytopathology combined with BRAFV600E mutation analysis. Given its advantages in terms of operability, time efficiency, non-invasiveness, and the wide availability of ultrasonography, it provides a new alternative for thyroid nodule diagnosis.
Thyroid ultrasonic artificial intelligence shows statistically equivalent performance for thyroid nodule diagnosis to FNA cytopathology combined with BRAFV600E mutation analysis. It can be widely applied in hospitals and clinics to assist radiologists in thyroid nodule screening and is expected to reduce the need for relatively invasive FNA biopsies.
• In a retrospective cohort of 346 nodules, the evaluated artificial intelligence (AI) system did not significantly differ from fine needle aspiration (FNA) cytopathology alone and combined with gene mutation analysis in accuracy. • In a prospective multicenter cohort of 291 nodules, the accuracy of the AI diagnostic system was not significantly different from that of FNA cytopathology either alone or combined with gene mutation analysis. • For 45 indeterminate Bethesda category III and IV nodules, the AI system did not perform significantly differently from BRAFV600E mutation analysis.
Zhou T
,Xu L
,Shi J
,Zhang Y
,Lin X
,Wang Y
,Hu T
,Xu R
,Xie L
,Sun L
,Li D
,Zhang W
,Chen C
,Wang W
,Xu C
,Kong F
,Xun Y
,Yu L
,Zhang S
,Ding J
,Wu F
,Tang T
,Zhan S
,Zhang J
,Wu G
,Zheng H
,Kong D
,Luo D
... -
《-》
Digital image-assisted quantitative nuclear analysis improves diagnostic accuracy of thyroid fine-needle aspiration cytology.
Thyroid fine-needle aspiration (FNA) plays a key role in triaging thyroid nodules. Yet many cases are assigned to indeterminate categories. The new category "noninvasive follicular thyroid neoplasm with papillary-like features" (NIFTP) complicates thyroid cytology. Digital image-derived nuclear measurements might objectively distinguish papillary thyroid carcinoma (PTC) from benign nodules and NIFTP.
All thyroid FNAs from 2012 to 2016 of atypia of undetermined significance (A; n = 8) and suspicious for malignancy (S; n = 2) with sufficient cellularity and surgical follow-up, all FNAs preceding NIFTP (n = 6), and a random sample of PTC (n = 9) and benign (n = 10) cytology were studied. A modified Giemsa-stained slide from each case was scanned using the Aperio imaging system, and long (dl ) and short (ds )-axis diameters were measured for 125 nuclei per case. Nuclear area and elongation were calculated.
Nuclear area was larger in PTC (mean, 77.2 μm2 [range, 70.6-86.0 μm2 ]) than benign (mean, 43.3 μm2 [range 38.2-52.2 μm2 ]) (P < .001). Nuclear areas from indeterminate FNAs segregated according to final histology (A/S PTC mean 72.7 μm2 , A/S benign mean 53.7 μm2 ; P = 0.004), and were not significantly different from definitive FNAs of the same diagnosis. NIFTP nuclear area was smaller than PTC (mean, 54.8 μm2 [range, 46.7-66.1 μm2 ]; P < .001). Nuclear elongation showed similar results, but with greater group overlap.
Nuclear area and elongation can be calculated using a commercial digital imager; both correlate with the final surgical pathology diagnosis of PTC versus benign, including NIFTP. Area provides greater resolution than elongation. This technique could be used to resolve indeterminate cytology in which PTC is considered.
Chain K
,Legesse T
,Heath JE
,Staats PN
... -
《-》