Accelerated training of bootstrap aggregation-based deep information extraction systems from cancer pathology reports.

来自 PUBMED

作者:

Yoon HJKlasky HBGounley JPAlawad MGao SDurbin EBWu XCStroup ADoherty JCoyle LPenberthy LBlair Christian JTourassi GD

展开

摘要:

In machine learning, it is evident that the classification of the task performance increases if bootstrap aggregation (bagging) is applied. However, the bagging of deep neural networks takes tremendous amounts of computational resources and training time. The research question that we aimed to answer in this research is whether we could achieve higher task performance scores and accelerate the training by dividing a problem into sub-problems. The data used in this study consist of free text from electronic cancer pathology reports. We applied bagging and partitioned data training using Multi-Task Convolutional Neural Network (MT-CNN) and Multi-Task Hierarchical Convolutional Attention Network (MT-HCAN) classifiers. We split a big problem into 20 sub-problems, resampled the training cases 2,000 times, and trained the deep learning model for each bootstrap sample and each sub-problem-thus, generating up to 40,000 models. We performed the training of many models concurrently in a high-performance computing environment at Oak Ridge National Laboratory (ORNL). We demonstrated that aggregation of the models improves task performance compared with the single-model approach, which is consistent with other research studies; and we demonstrated that the two proposed partitioned bagging methods achieved higher classification accuracy scores on four tasks. Notably, the improvements were significant for the extraction of cancer histology data, which had more than 500 class labels in the task; these results show that data partition may alleviate the complexity of the task. On the contrary, the methods did not achieve superior scores for the tasks of site and subsite classification. Intrinsically, since data partitioning was based on the primary cancer site, the accuracy depended on the determination of the partitions, which needs further investigation and improvement. Results in this research demonstrate that 1. The data partitioning and bagging strategy achieved higher performance scores. 2. We achieved faster training leveraged by the high-performance Summit supercomputer at ORNL.

收起

展开

DOI:

10.1016/j.jbi.2020.103564

被引量:

5

年份:

1970

SCI-Hub (全网免费下载) 发表链接

通过 文献互助 平台发起求助,成功后即可免费获取论文全文。

查看求助

求助方法1:

知识发现用户

每天可免费求助50篇

求助

求助方法1:

关注微信公众号

每天可免费求助2篇

求助方法2:

求助需要支付5个财富值

您现在财富值不足

您可以通过 应助全文 获取财富值

求助方法2:

完成求助需要支付5财富值

您目前有 1000 财富值

求助

我们已与文献出版商建立了直接购买合作。

你可以通过身份认证进行实名认证,认证成功后本次下载的费用将由您所在的图书馆支付

您可以直接购买此文献,1~5分钟即可下载全文,部分资源由于网络原因可能需要更长时间,请您耐心等待哦~

身份认证 全文购买

相似文献(109)

参考文献(13)

引证文献(5)

来源期刊

-

影响因子:暂无数据

JCR分区: 暂无

中科院分区:暂无

研究点推荐

关于我们

zlive学术集成海量学术资源,融合人工智能、深度学习、大数据分析等技术,为科研工作者提供全面快捷的学术服务。在这里我们不忘初心,砥砺前行。

友情链接

联系我们

合作与服务

©2024 zlive学术声明使用前必读