Detection of Pneumothorax with Deep Learning Models: Learning From Radiologist Labels vs Natural Language Processing Model Generated Labels.-Z研学术

Detection of Pneumothorax with Deep Learning Models: Learning From Radiologist Labels vs Natural Language Processing Model Generated Labels.

来自 PUBMED

作者：

Hallinan JTPD ， Feng M ， Ng D ， Sia SY ， Tiong VTY ， Jagmohan P ， Makmur A ， Thian YL

展开 

摘要：

To compare the performance of pneumothorax deep learning detection models trained with radiologist versus natural language processing (NLP) labels on the NIH ChestX-ray14 dataset. The ChestX-ray14 dataset consisted of 112,120 frontal chest radiographs with 5302 positive and 106, 818 negative labels for pneumothorax using NLP (dataset A). All 112,120 radiographs were also inspected by 4 radiologists leaving a visually confirmed set of 5,138 positive and 104,751 negative for pneumothorax (dataset B). Datasets A and B were used independently to train 3 convolutional neural network (CNN) architectures (ResNet-50, DenseNet-121 and EfficientNetB3). All models' area under the receiver operating characteristic curve (AUC) were evaluated with the official NIH test set and an external test set of 525 chest radiographs from our emergency department. There were significantly higher AUCs on the NIH internal test set for CNN models trained with radiologist vs NLP labels across all architectures. AUCs for the NLP/radiologist-label models were 0.838 (95%CI:0.830, 0.846)/0.881 (95%CI:0.873,0.887) for ResNet-50 (p = 0.034), 0.839 (95%CI:0.831,0.847)/0.880 (95%CI:0.873,0.887) for DenseNet-121, and 0.869 (95%CI: 0.863,0.876)/0.943 (95%CI: 0.939,0.946) for EfficientNetB3 (p ≤0.001). Evaluation with the external test set also showed higher AUCs (p <0.001) for the CNN models trained with radiologist versus NLP labels across all architectures. The AUCs for the NLP/radiologist-label models were 0.686 (95%CI:0.632,0.740)/0.806 (95%CI:0.758,0.854) for ResNet-50, 0.736 (95%CI:0.686, 0.787)/0.871 (95%CI:0.830,0.912) for DenseNet-121, and 0.822 (95%CI: 0.775,0.868)/0.915 (95%CI: 0.882,0.948) for EfficientNetB3. We demonstrated improved performance and generalizability of pneumothorax detection deep learning models trained with radiologist labels compared to models trained with NLP labels.

收起

展开 

DOI：

10.1016/j.acra.2021.09.013

被引量：

年份：

1970

全部来源

SCI-Hub (全网免费下载)

发表链接

ResearchGate (全网免费下载)

钛学术 (全网免费下载)

通过文献互助平台发起求助，成功后即可免费获取论文全文。

查看求助

求助方法1：

知识发现用户

每天可免费求助50篇

求助

求助方法1：

关注微信公众号

每天可免费求助2篇

求助方法2：

求助需要支付5个财富值

您现在财富值不足

您可以通过应助全文获取财富值

求助方法2：

完成求助需要支付5财富值

您目前有 1000 财富值

求助

我们已与文献出版商建立了直接购买合作。

你可以通过身份认证进行实名认证，认证成功后本次下载的费用将由您所在的图书馆支付

您可以直接购买此文献，1~5分钟即可下载全文，部分资源由于网络原因可能需要更长时间，请您耐心等待哦~

身份认证全文购买

相似文献(255)

参考文献(0)

引证文献(2)

Detection of Pneumothorax with Deep Learning Models: Learning From Radiologist Labels vs Natural Language Processing Model Generated Labels.

To compare the performance of pneumothorax deep learning detection models trained with radiologist versus natural language processing (NLP) labels on the NIH ChestX-ray14 dataset. The ChestX-ray14 dataset consisted of 112,120 frontal chest radiographs with 5302 positive and 106, 818 negative labels for pneumothorax using NLP (dataset A). All 112,120 radiographs were also inspected by 4 radiologists leaving a visually confirmed set of 5,138 positive and 104,751 negative for pneumothorax (dataset B). Datasets A and B were used independently to train 3 convolutional neural network (CNN) architectures (ResNet-50, DenseNet-121 and EfficientNetB3). All models' area under the receiver operating characteristic curve (AUC) were evaluated with the official NIH test set and an external test set of 525 chest radiographs from our emergency department. There were significantly higher AUCs on the NIH internal test set for CNN models trained with radiologist vs NLP labels across all architectures. AUCs for the NLP/radiologist-label models were 0.838 (95%CI:0.830, 0.846)/0.881 (95%CI:0.873,0.887) for ResNet-50 (p = 0.034), 0.839 (95%CI:0.831,0.847)/0.880 (95%CI:0.873,0.887) for DenseNet-121, and 0.869 (95%CI: 0.863,0.876)/0.943 (95%CI: 0.939,0.946) for EfficientNetB3 (p ≤0.001). Evaluation with the external test set also showed higher AUCs (p <0.001) for the CNN models trained with radiologist versus NLP labels across all architectures. The AUCs for the NLP/radiologist-label models were 0.686 (95%CI:0.632,0.740)/0.806 (95%CI:0.758,0.854) for ResNet-50, 0.736 (95%CI:0.686, 0.787)/0.871 (95%CI:0.830,0.912) for DenseNet-121, and 0.822 (95%CI: 0.775,0.868)/0.915 (95%CI: 0.882,0.948) for EfficientNetB3. We demonstrated improved performance and generalizability of pneumothorax detection deep learning models trained with radiologist labels compared to models trained with NLP labels.

Hallinan JTPD ，Feng M ，Ng D ，Sia SY ，Tiong VTY ，Jagmohan P ，Makmur A ，Thian YL ... - 《-》

被引量: 2 发表:1970年
Comparison of radiologist versus natural language processing-based image annotations for deep learning system for tuberculosis screening on chest radiographs.

Although natural language processing (NLP) can rapidly extract disease labels from radiology reports to create datasets for deep learning models, this may be less accurate than having radiologists manually review the images. In this study, we compared agreement between natural language processing (NLP) and radiologist-curated labels for possible tuberculosis (TB) on chest radiographs (CXR) and evaluated the performance of deep convolutional neural networks (DCNN) trained to identify TB using the preceding two sets of labels. We collected 10,951 CXRs from the NIH ChestX-ray14 dataset and labeled them as positive or negative for possible TB based on two methods: 1) NLP-derived disease labels and 2) radiologist-review of images. These images were used to train DCNNs on varying dataset sizes for possible TB and tested on an external dataset of 800 CXRs. Area under the ROC curve (AUC) was used to evaluate DCNNs. There was poor agreement between NLP and radiologist-curated labels for potential TB (Kappa coefficient 0.34). DCNNs trained using radiologist-curated labels had higher performance than the algorithm trained using the NLP-labels, regardless of the number of images used for training. The best-performing DCNN had an AUC of 0.88, which was trained on 10,951 images using the radiologist-annotated sets. DCNNs trained on CXRs labeled by a radiologist consistently outperformed those trained on the same CXRs labeled by NLP, highlighting the benefit of radiologists' determining groundtruth for machine learning dataset curation.

Yi PH ，Kim TK ，Lin CT 《-》

被引量: 2 发表:1970年
Chest Radiograph Interpretation with Deep Learning Models: Assessment with Radiologist-adjudicated Reference Standards and Population-adjusted Evaluation.

Majkowska A ，Mittal S ，Steiner DF ，Reicher JJ ，McKinney SM ，Duggan GE ，Eswaran K ，Cameron Chen PH ，Liu Y ，Kalidindi SR ，Ding A ，Corrado GS ，Tse D ，Shetty S ... - 《-》

被引量: - 发表:1970年
Automated detection of moderate and large pneumothorax on frontal chest X-rays using deep convolutional neural networks: A retrospective study.

Taylor AG ，Mielke C ，Mongan J 《-》

被引量: - 发表:1970年
Effect of Training Data Volume on Performance of Convolutional Neural Network Pneumothorax Classifiers.

Large datasets with high-quality labels required to train deep neural networks are challenging to obtain in the radiology domain. This work investigates the effect of training dataset size on the performance of deep learning classifiers, focusing on chest radiograph pneumothorax detection as a proxy visual task in the radiology domain. Two open-source datasets (ChestX-ray14 and CheXpert) comprising 291,454 images were merged and convolutional neural networks trained with stepwise increase in training dataset sizes. Model iterations at each dataset volume were evaluated on an external test set of 525 emergency department chest radiographs. Learning curve analysis was performed to fit the observed AUCs for all models generated. For all three network architectures tested, model AUCs and accuracy increased rapidly from 2 × 103 to 20 × 103 training samples, with more gradual increase until the maximum training dataset size of 291 × 103 images. AUCs for models trained with the maximum tested dataset size of 291 × 103 images were significantly higher than models trained with 20 × 103 images: ResNet-50: AUC20k = 0.86, AUC291k = 0.95, p < 0.001; DenseNet-121 AUC20k = 0.85, AUC291k = 0.93, p < 0.001; EfficientNet AUC20k = 0.92, AUC 291 k = 0.98, p < 0.001. Our study established learning curves describing the relationship between dataset training size and model performance of deep learning convolutional neural networks applied to a typical radiology binary classification task. These curves suggest a point of diminishing performance returns for increasing training data volumes, which algorithm developers should consider given the high costs of obtaining and labelling radiology data.

Thian YL ，Ng DW ，Hallinan JTPD ，Jagmohan P ，Sia SY ，Mohamed JSA ，Quek ST ，Feng M ... - 《-》

被引量: - 发表:1970年

加载更多

来源期刊

影响因子：暂无数据

JCR分区：暂无

中科院分区：暂无