TwoPaCo: an efficient algorithm to build the compacted de Bruijn graph from many complete genomes.
摘要:
de Bruijn graphs have been proposed as a data structure to facilitate the analysis of related whole genome sequences, in both a population and comparative genomic settings. However, current approaches do not scale well to many genomes of large size (such as mammalian genomes). In this article, we present TwoPaCo, a simple and scalable low memory algorithm for the direct construction of the compacted de Bruijn graph from a set of complete genomes. We demonstrate that it can construct the graph for 100 simulated human genomes in less than a day and eight real primates in < 2 h, on a typical shared-memory machine. We believe that this progress will enable novel biological analyses of hundreds of mammalian-sized genomes. Our code and data is available for download from github.com/medvedevgroup/TwoPaCo. ium125@psu.edu. Supplementary data are available at Bioinformatics online.
收起
展开
DOI:
10.1093/bioinformatics/btw609
被引量:
年份:
2017


通过 文献互助 平台发起求助,成功后即可免费获取论文全文。
求助方法1:
知识发现用户
每天可免费求助50篇
求助方法1:
关注微信公众号
每天可免费求助2篇
求助方法2:
完成求助需要支付5财富值
您目前有 1000 财富值
相似文献(162)
参考文献(0)
引证文献(31)
来源期刊
影响因子:暂无数据
JCR分区: 暂无
中科院分区:暂无