Semi-supervised training of deep convolutional neural networks with heterogeneous data and few local annotations: An experiment on prostate histopathology image classification.-Z研学术

Semi-supervised training of deep convolutional neural networks with heterogeneous data and few local annotations: An experiment on prostate histopathology image classification.

来自 PUBMED

作者：

Marini N ， Otálora S ， Müller H ， Atzori M

展开 

摘要：

Convolutional neural networks (CNNs) are state-of-the-art computer vision techniques for various tasks, particularly for image classification. However, there are domains where the training of classification models that generalize on several datasets is still an open challenge because of the highly heterogeneous data and the lack of large datasets with local annotations of the regions of interest, such as histopathology image analysis. Histopathology concerns the microscopic analysis of tissue specimens processed in glass slides to identify diseases such as cancer. Digital pathology concerns the acquisition, management and automatic analysis of digitized histopathology images that are large, having in the order of 100'0002 pixels per image. Digital histopathology images are highly heterogeneous due to the variability of the image acquisition procedures. Creating locally labeled regions (required for the training) is time-consuming and often expensive in the medical field, as physicians usually have to annotate the data. Despite the advances in deep learning, leveraging strongly and weakly annotated datasets to train classification models is still an unsolved problem, mainly when data are very heterogeneous. Large amounts of data are needed to create models that generalize well. This paper presents a novel approach to train CNNs that generalize to heterogeneous datasets originating from various sources and without local annotations. The data analysis pipeline targets Gleason grading on prostate images and includes two models in sequence, following a teacher/student training paradigm. The teacher model (a high-capacity neural network) automatically annotates a set of pseudo-labeled patches used to train the student model (a smaller network). The two models are trained with two different teacher/student approaches: semi-supervised learning and semi-weekly supervised learning. For each of the two approaches, three student training variants are presented. The baseline is provided by training the student model only with the strongly annotated data. Classification performance is evaluated on the student model at the patch level (using the local annotations of the Tissue Micro-Arrays Zurich dataset) and at the global level (using the TCGA-PRAD, The Cancer Genome Atlas-PRostate ADenocarcinoma, whole slide image Gleason score). The teacher/student paradigm allows the models to better generalize on both datasets, despite the inter-dataset heterogeneity and the small number of local annotations used. The classification performance is improved both at the patch-level (up to κ=0.6127±0.0133 from κ=0.5667±0.0285), at the TMA core-level (Gleason score) (up to κ=0.7645±0.0231 from κ=0.7186±0.0306) and at the WSI-level (Gleason score) (up to κ=0.4529±0.0512 from κ=0.2293±0.1350). The results show that with the teacher/student paradigm, it is possible to train models that generalize on datasets from entirely different sources, despite the inter-dataset heterogeneity and the lack of large datasets with local annotations.

收起

展开 

DOI：

10.1016/j.media.2021.102165

被引量：

年份：

1970

全部来源

SCI-Hub (全网免费下载)

发表链接

ResearchGate (全网免费下载)

钛学术 (全网免费下载)

通过文献互助平台发起求助，成功后即可免费获取论文全文。

查看求助

求助方法1：

知识发现用户

每天可免费求助50篇

求助

求助方法1：

关注微信公众号

每天可免费求助2篇

求助方法2：

求助需要支付5个财富值

您现在财富值不足

您可以通过应助全文获取财富值

求助方法2：

完成求助需要支付5财富值

您目前有 1000 财富值

求助

我们已与文献出版商建立了直接购买合作。

你可以通过身份认证进行实名认证，认证成功后本次下载的费用将由您所在的图书馆支付

您可以直接购买此文献，1~5分钟即可下载全文，部分资源由于网络原因可能需要更长时间，请您耐心等待哦~

身份认证全文购买

相似文献(2766)

参考文献(0)

引证文献(10)

Semi-supervised training of deep convolutional neural networks with heterogeneous data and few local annotations: An experiment on prostate histopathology image classification.

Convolutional neural networks (CNNs) are state-of-the-art computer vision techniques for various tasks, particularly for image classification. However, there are domains where the training of classification models that generalize on several datasets is still an open challenge because of the highly heterogeneous data and the lack of large datasets with local annotations of the regions of interest, such as histopathology image analysis. Histopathology concerns the microscopic analysis of tissue specimens processed in glass slides to identify diseases such as cancer. Digital pathology concerns the acquisition, management and automatic analysis of digitized histopathology images that are large, having in the order of 100'0002 pixels per image. Digital histopathology images are highly heterogeneous due to the variability of the image acquisition procedures. Creating locally labeled regions (required for the training) is time-consuming and often expensive in the medical field, as physicians usually have to annotate the data. Despite the advances in deep learning, leveraging strongly and weakly annotated datasets to train classification models is still an unsolved problem, mainly when data are very heterogeneous. Large amounts of data are needed to create models that generalize well. This paper presents a novel approach to train CNNs that generalize to heterogeneous datasets originating from various sources and without local annotations. The data analysis pipeline targets Gleason grading on prostate images and includes two models in sequence, following a teacher/student training paradigm. The teacher model (a high-capacity neural network) automatically annotates a set of pseudo-labeled patches used to train the student model (a smaller network). The two models are trained with two different teacher/student approaches: semi-supervised learning and semi-weekly supervised learning. For each of the two approaches, three student training variants are presented. The baseline is provided by training the student model only with the strongly annotated data. Classification performance is evaluated on the student model at the patch level (using the local annotations of the Tissue Micro-Arrays Zurich dataset) and at the global level (using the TCGA-PRAD, The Cancer Genome Atlas-PRostate ADenocarcinoma, whole slide image Gleason score). The teacher/student paradigm allows the models to better generalize on both datasets, despite the inter-dataset heterogeneity and the small number of local annotations used. The classification performance is improved both at the patch-level (up to κ=0.6127±0.0133 from κ=0.5667±0.0285), at the TMA core-level (Gleason score) (up to κ=0.7645±0.0231 from κ=0.7186±0.0306) and at the WSI-level (Gleason score) (up to κ=0.4529±0.0512 from κ=0.2293±0.1350). The results show that with the teacher/student paradigm, it is possible to train models that generalize on datasets from entirely different sources, despite the inter-dataset heterogeneity and the lack of large datasets with local annotations.

Marini N ，Otálora S ，Müller H ，Atzori M ... - 《-》

被引量: 10 发表:1970年
Combining weakly and strongly supervised learning improves strong supervision in Gleason pattern classification.

Otálora S ，Marini N ，Müller H ，Atzori M ... - 《BMC MEDICAL IMAGING》

被引量: 6 发表:1970年
Automatic diagnosis and grading of Prostate Cancer with weakly supervised learning on whole slide images.

The workflow of prostate cancer diagnosis and grading is cumbersome and the results suffer from substantial inter-observer variability. Recent trials have shown potential in using machine learning to develop automated systems to address this challenge. Most automated deep learning systems for prostate cancer Gleason grading focused on supervised learning requiring demanding fine-grained pixel-level annotations. A weakly-supervised deep learning model with slide-level labels is presented in this study for the diagnosis and grading of prostate cancer with whole slide image (WSI). WSIs are first cropped into small patches and then processed with a deep learning model to extract patch-level features. A graph convolution network (GCN) is used to aggregate the features for classifications. Throughout the training process, the noisy labels are progressively filtered out to reduce inter-observer variations in clinical reports. Finally, multi-center independent test cohorts with 6,174 slides are collected to evaluate the prostate cancer diagnosis and grading performance of our model. The cancer diagnosis (2-level classification) results on two external test sets (n= 4,675, n= 844) show an area under the receiver operating characteristic curve (AUC) of 0.985 and 0.986. The Gleason grading (6-level classification) results reach 0.931 quadratic weighted kappa on the internal test set (n= 531). It generalizes well on the external test dataset (n= 844) with 0.801 quadratic weighted kappa with the reference standard set independently. The model enables pathological meaningful interpretability by visualizing the most attended lesions which are highly consistent with expert annotations. The proposed model incorporates a graph network in weakly supervised learning with only slide-level reports. A robust learning strategy is also employed to correct the label noise. It is highly accurate (>0.985 AUC for diagnosis) and also interpretable with intuitive heatmap visualization. It can be unified with a digital pathology pipeline to deliver prostate cancer metrics for a pathology report.

Xiang J ，Wang X ，Wang X ，Zhang J ，Yang S ，Yang W ，Han X ，Liu Y ... - 《-》

被引量: 7 发表:1970年
Self-Learning for Weakly Supervised Gleason Grading of Local Patterns.

Silva-Rodriguez J ，Colomer A ，Dolz J ，Naranjo V ... - 《-》

被引量: 4 发表:1970年
Self-supervised driven consistency training for annotation efficient histopathology image analysis.

Training a neural network with a large labeled dataset is still a dominant paradigm in computational histopathology. However, obtaining such exhaustive manual annotations is often expensive, laborious, and prone to inter and intra-observer variability. While recent self-supervised and semi-supervised methods can alleviate this need by learning unsupervised feature representations, they still struggle to generalize well to downstream tasks when the number of labeled instances is small. In this work, we overcome this challenge by leveraging both task-agnostic and task-specific unlabeled data based on two novel strategies: (i) a self-supervised pretext task that harnesses the underlying multi-resolution contextual cues in histology whole-slide images to learn a powerful supervisory signal for unsupervised representation learning; (ii) a new teacher-student semi-supervised consistency paradigm that learns to effectively transfer the pretrained representations to downstream tasks based on prediction consistency with the task-specific unlabeled data. We carry out extensive validation experiments on three histopathology benchmark datasets across two classification and one regression based tasks, i.e., tumor metastasis detection, tissue type classification, and tumor cellularity quantification. Under limited-label data, the proposed method yields tangible improvements, which is close to or even outperforming other state-of-the-art self-supervised and supervised baselines. Furthermore, we empirically show that the idea of bootstrapping the self-supervised pretrained features is an effective way to improve the task-specific semi-supervised learning on standard benchmarks. Code and pretrained models are made available at: https://github.com/srinidhiPY/SSL_CR_Histo.

Srinidhi CL ，Kim SW ，Chen FD ，Martel AL ... - 《-》

被引量: 21 发表:1970年

加载更多

来源期刊

影响因子：暂无数据

JCR分区：暂无

中科院分区：暂无