AN EFFICIENT ALGORITHM FOR CHINESE POSTMAN WALK ON BI-DIRECTED DE BRUIJN GRAPHS.
摘要:
Sequence assembly from short reads is an important problem in biology. It is known that solving the sequence assembly problem exactly on a bi-directed de Bruijn graph or a string graph is intractable. However, finding a shortest double stranded DNA string (SDDNA) containing all the -long words in the reads seems to be a good heuristic to get close to the original genome. This problem is equivalent to finding a cyclic Chinese Postman (CP) walk on the underlying unweighted bi-directed de Bruijn graph built from the reads. The Chinese Postman walk Problem (CPP) is solved by reducing it to a general bi-directed flow on this graph which runs in (|| log(||)) time. In this paper we show that the cyclic CPP on bi-directed graphs can be solved without reducing it to bi-directed flow. We present a Θ((|| + ||) log(||) + ()) time algorithm to solve the cyclic CPP on a weighted bi-directed de Bruijn graph, where = max{|{∣() - () > 0}|, |{∣() - () < 0}|} and = max{∣() - ()}. Our algorithm performs asymptotically better than the bi-directed flow algorithm when the number of nodes is much less than the nodes in the bi-directed graph. From our experimental results on various datasets, we have noticed that the value of /|| lies between 0.08% and 0.13% with 95% probability. Many practical bi-directed de Bruijn graphs do not have cyclic CP walks. In such cases it is not clear how the bi-directed flow can be useful in identifying contigs. Our algorithm can handle such situations and identify maximal bi-directed sub-graphs that have CP walks. A Θ((|| + ||)) time heuristic algorithm based on these ideas has been implemented for the SDDNA problem. This algorithm was tested on short reads from a plant genome and achieves an approximation ratio of at most 1.0134. We also present a Θ((|| + ||) log()) time algorithm for the single source shortest path problem on bi-directed de Bruijn graphs, which may be of independent interest.
收起
展开
DOI:
10.1007/978-3-642-17458-2_16
被引量:
年份:
2010


通过 文献互助 平台发起求助,成功后即可免费获取论文全文。
求助方法1:
知识发现用户
每天可免费求助50篇
求助方法1:
关注微信公众号
每天可免费求助2篇
求助方法2:
完成求助需要支付5财富值
您目前有 1000 财富值
相似文献(246)
参考文献(5)
引证文献(4)
来源期刊
影响因子:暂无数据
JCR分区: 暂无
中科院分区:暂无