Testing the performance, adequacy, and applicability of an artificial intelligence model for pediatric pneumonia diagnosis.-Z研学术

Testing the performance, adequacy, and applicability of an artificial intelligence model for pediatric pneumonia diagnosis.

来自 PUBMED

作者：

Domínguez-Rodríguez S ， Liz-López H ， Panizo-LLedot A ， Ballesteros Á ， Dagan R ， Greenberg D ， Gutiérrez L ， Rojo P ， Otheo E ， Galán JC ， Villanueva S ， García S ， Mosquera P ， Tagarro A ， Moraleda C ， Camacho D

展开 

摘要：

Community-acquired Pneumonia (CAP) is a common childhood infectious disease. Deep learning models show promise in X-ray interpretation and diagnosis, but their validation should be extended due to limitations in the current validation workflow. To extend the standard validation workflow we propose doing a pilot test with the next characteristics. First, the assumption of perfect ground truth (100% sensitive and specific) is unrealistic, as high intra and inter-observer variability have been reported. To address this, we propose using Bayesian latent class models (BLCA) to estimate accuracy during the pilot. Additionally, assessing only the performance of a model without considering its applicability and acceptance by physicians is insufficient if we hope to integrate AI systems into day-to-day clinical practice. Therefore, we propose employing explainable artificial intelligence (XAI) methods during the pilot test to involve physicians and evaluate how well a Deep Learning model is accepted and how helpful it is for routine decisions as well as analyze its limitations by assessing the etiology. This study aims to apply the proposed pilot to test a deep Convolutional Neural Network (CNN)-based model for identifying consolidation in pediatric chest-X-ray (CXR) images already validated using the standard workflow. For the standard validation workflow, a total of 5856 public CXRs and 950 private CXRs were used to train and validate the performance of the CNN model. The performance of the model was estimated assuming a perfect ground truth. For the pilot test proposed in this article, a total of 190 pediatric chest-X-ray (CXRs) images were used to test the CNN model support decision tool (SDT). The performance of the model on the pilot test was estimated using extensions of the two-test Bayesian Latent-Class model (BLCA). The sensitivity, specificity, and accuracy of the model were also assessed. The clinical characteristics of the patients were compared according to the model performance. The adequacy and applicability of the SDT was tested using XAI techniques. The adequacy of the SDT was assessed by asking two senior physicians the agreement rate with the SDT. The applicability was tested by asking three medical residents before and after using the SDT and the agreement between experts was calculated using the kappa index. The CRXs of the pilot test were labeled by the panel of experts into consolidation (124/176, 70.4%) and no-consolidation/other infiltrates (52/176, 29.5%). A total of 31/176 (17.6%) discrepancies were found between the model and the panel of experts with a kappa index of 0.6. The sensitivity and specificity reached a median of 90.9 (95% Credible Interval (CrI), 81.2-99.9) and 77.7 (95% CrI, 63.3-98.1), respectively. The senior physicians reported a high agreement rate (70%) with the system in identifying logical consolidation patterns. The three medical residents reached a higher agreement using SDT than alone with experts (0.66±0.1 vs. 0.75±0.2). Through the pilot test, we have successfully verified that the deep learning model was underestimated when a perfect ground truth was considered. Furthermore, by conducting adequacy and applicability tests, we can ensure that the model is able to identify logical patterns within the CXRs and that augmenting clinicians with automated preliminary read assistants could accelerate their workflows and enhance accuracy in identifying consolidation in pediatric CXR images.

收起

展开 

DOI：

10.1016/j.cmpb.2023.107765

被引量：

年份：

1970

全部来源

SCI-Hub (全网免费下载)

发表链接

ResearchGate (全网免费下载)

钛学术 (全网免费下载)

通过文献互助平台发起求助，成功后即可免费获取论文全文。

查看求助

求助方法1：

知识发现用户

每天可免费求助50篇

求助

求助方法1：

关注微信公众号

每天可免费求助2篇

求助方法2：

求助需要支付5个财富值

您现在财富值不足

您可以通过应助全文获取财富值

求助方法2：

完成求助需要支付5财富值

您目前有 1000 财富值

求助

我们已与文献出版商建立了直接购买合作。

你可以通过身份认证进行实名认证，认证成功后本次下载的费用将由您所在的图书馆支付

您可以直接购买此文献，1~5分钟即可下载全文，部分资源由于网络原因可能需要更长时间，请您耐心等待哦~

身份认证全文购买

相似文献(242)

参考文献(0)

引证文献(0)

Testing the performance, adequacy, and applicability of an artificial intelligence model for pediatric pneumonia diagnosis.

Community-acquired Pneumonia (CAP) is a common childhood infectious disease. Deep learning models show promise in X-ray interpretation and diagnosis, but their validation should be extended due to limitations in the current validation workflow. To extend the standard validation workflow we propose doing a pilot test with the next characteristics. First, the assumption of perfect ground truth (100% sensitive and specific) is unrealistic, as high intra and inter-observer variability have been reported. To address this, we propose using Bayesian latent class models (BLCA) to estimate accuracy during the pilot. Additionally, assessing only the performance of a model without considering its applicability and acceptance by physicians is insufficient if we hope to integrate AI systems into day-to-day clinical practice. Therefore, we propose employing explainable artificial intelligence (XAI) methods during the pilot test to involve physicians and evaluate how well a Deep Learning model is accepted and how helpful it is for routine decisions as well as analyze its limitations by assessing the etiology. This study aims to apply the proposed pilot to test a deep Convolutional Neural Network (CNN)-based model for identifying consolidation in pediatric chest-X-ray (CXR) images already validated using the standard workflow. For the standard validation workflow, a total of 5856 public CXRs and 950 private CXRs were used to train and validate the performance of the CNN model. The performance of the model was estimated assuming a perfect ground truth. For the pilot test proposed in this article, a total of 190 pediatric chest-X-ray (CXRs) images were used to test the CNN model support decision tool (SDT). The performance of the model on the pilot test was estimated using extensions of the two-test Bayesian Latent-Class model (BLCA). The sensitivity, specificity, and accuracy of the model were also assessed. The clinical characteristics of the patients were compared according to the model performance. The adequacy and applicability of the SDT was tested using XAI techniques. The adequacy of the SDT was assessed by asking two senior physicians the agreement rate with the SDT. The applicability was tested by asking three medical residents before and after using the SDT and the agreement between experts was calculated using the kappa index. The CRXs of the pilot test were labeled by the panel of experts into consolidation (124/176, 70.4%) and no-consolidation/other infiltrates (52/176, 29.5%). A total of 31/176 (17.6%) discrepancies were found between the model and the panel of experts with a kappa index of 0.6. The sensitivity and specificity reached a median of 90.9 (95% Credible Interval (CrI), 81.2-99.9) and 77.7 (95% CrI, 63.3-98.1), respectively. The senior physicians reported a high agreement rate (70%) with the system in identifying logical consolidation patterns. The three medical residents reached a higher agreement using SDT than alone with experts (0.66±0.1 vs. 0.75±0.2). Through the pilot test, we have successfully verified that the deep learning model was underestimated when a perfect ground truth was considered. Furthermore, by conducting adequacy and applicability tests, we can ensure that the model is able to identify logical patterns within the CXRs and that augmenting clinicians with automated preliminary read assistants could accelerate their workflows and enhance accuracy in identifying consolidation in pediatric CXR images.

Domínguez-Rodríguez S ，Liz-López H ，Panizo-LLedot A ，Ballesteros Á ，Dagan R ，Greenberg D ，Gutiérrez L ，Rojo P ，Otheo E ，Galán JC ，Villanueva S ，García S ，Mosquera P ，Tagarro A ，Moraleda C ，Camacho D ... - 《-》

被引量: - 发表:1970年
Validating the accuracy of deep learning for the diagnosis of pneumonia on chest x-ray against a robust multimodal reference diagnosis: a post hoc analysis of two prospective studies.

Artificial intelligence (AI) seems promising in diagnosing pneumonia on chest x-rays (CXR), but deep learning (DL) algorithms have primarily been compared with radiologists, whose diagnosis can be not completely accurate. Therefore, we evaluated the accuracy of DL in diagnosing pneumonia on CXR using a more robust reference diagnosis. We trained a DL convolutional neural network model to diagnose pneumonia and evaluated its accuracy in two prospective pneumonia cohorts including 430 patients, for whom the reference diagnosis was determined a posteriori by a multidisciplinary expert panel using multimodal data. The performance of the DL model was compared with that of senior radiologists and emergency physicians reviewing CXRs and that of radiologists reviewing computed tomography (CT) performed concomitantly. Radiologists and DL showed a similar accuracy on CXR for both cohorts (p ≥ 0.269): cohort 1, radiologist 1 75.5% (95% confidence interval 69.1-80.9), radiologist 2 71.0% (64.4-76.8), DL 71.0% (64.4-76.8); cohort 2, radiologist 70.9% (64.7-76.4), DL 72.6% (66.5-78.0). The accuracy of radiologists and DL was significantly higher (p ≤ 0.022) than that of emergency physicians (cohort 1 64.0% [57.1-70.3], cohort 2 63.0% [55.6-69.0]). Accuracy was significantly higher for CT (cohort 1 79.0% [72.8-84.1], cohort 2 89.6% [84.9-92.9]) than for CXR readers including radiologists, clinicians, and DL (all p-values < 0.001). When compared with a robust reference diagnosis, the performance of AI models to identify pneumonia on CXRs was inferior than previously reported but similar to that of radiologists and better than that of emergency physicians. The clinical relevance of AI models for pneumonia diagnosis may have been overestimated. AI models should be benchmarked against robust reference multimodal diagnosis to avoid overestimating its performance. NCT02467192 , and NCT01574066 . • We evaluated an openly-access convolutional neural network (CNN) model to diagnose pneumonia on CXRs. • CNN was validated against a strong multimodal reference diagnosis. • In our study, the CNN performance (area under the receiver operating characteristics curve 0.74) was lower than that previously reported when validated against radiologists' diagnosis (0.99 in a recent meta-analysis). • The CNN performance was significantly higher than emergency physicians' (p ≤ 0.022) and comparable to that of board-certified radiologists (p ≥ 0.269).

Hofmeister J ，Garin N ，Montet X ，Scheffler M ，Platon A ，Poletti PA ，Stirnemann J ，Debray MP ，Claessens YE ，Duval X ，Prendki V ... - 《-》

被引量: - 发表:1970年
Comparison of Chest Radiograph Interpretations by Artificial Intelligence Algorithm vs Radiology Residents.

Wu JT ，Wong KCL ，Gur Y ，Ansari N ，Karargyris A ，Sharma A ，Morris M ，Saboury B ，Ahmad H ，Boyko O ，Syed A ，Jadhav A ，Wang H ，Pillai A ，Kashyap S ，Moradi M ，Syeda-Mahmood T ... - 《JAMA Network Open》

被引量: 45 发表:1970年
Multi-View Ensemble Convolutional Neural Network to Improve Classification of Pneumonia in Low Contrast Chest X-Ray Images.

Ferreira JR ，Armando Cardona Cardenas D ，Moreno RA ，de Fatima de Sa Rebelo M ，Krieger JE ，Antonio Gutierrez M ... - 《-》

被引量: - 发表:2020年
Deep Learning Models to Predict Fatal Pneumonia Using Chest X-Ray Images.

Chest X-ray (CXR) is indispensable to the assessment of severity, diagnosis, and management of pneumonia. Deep learning is an artificial intelligence (AI) technology that has been applied to the interpretation of medical images. This study investigated the feasibility of classifying fatal pneumonia based on CXR images using deep learning models on publicly available platforms. CXR images of patients with pneumonia at diagnosis were labeled as fatal or nonfatal based on medical records. We applied CXR images from 1031 patients with nonfatal pneumonia and 243 patients with fatal pneumonia for training and self-evaluation of the deep learning models. All labeled CXR images were randomly allocated to the training, validation, and test datasets of deep learning models. Data augmentation techniques were not used in this study. We created two deep learning models using two publicly available platforms. The first model showed an area under the precision-recall curve of 0.929 with a sensitivity of 50.0% and a specificity of 92.4% for classifying fatal pneumonia. We evaluated the performance of our deep learning models using sensitivity, specificity, PPV, negative predictive value (NPV), accuracy, and F1 score. Using the external validation test dataset of 100 CXR images, the sensitivity, specificity, accuracy, and F1 score were 68.0%, 86.0%, 77.0%, and 74.7%, respectively. In the original dataset, the performance of the second model showed a sensitivity, specificity, and accuracy of 39.6%, 92.8%, and 82.7%, respectively, while external validation showed values of 38.0%, 92.0%, and 65.0%, respectively. The F1 score was 52.1%. These results were comparable to those obtained by respiratory physicians and residents. The deep learning models yielded good accuracy in classifying fatal pneumonia. By further improving the performance, AI could assist physicians in the severity assessment of patients with pneumonia.

Anai S ，Hisasue J ，Takaki Y ，Hara N ... - 《-》

被引量: - 发表:1970年

加载更多

来源期刊

影响因子：暂无数据

JCR分区：暂无

中科院分区：暂无