Evaluation of nine popular de novo assemblers in microbial genome assembly.-Z研学术

Evaluation of nine popular de novo assemblers in microbial genome assembly.

来自 PUBMED

作者：

Forouzan E ， Maleki MSM ， Karkhane AA ， Yakhchali B

展开 

摘要：

Next generation sequencing (NGS) technologies are revolutionizing biology, with Illumina being the most popular NGS platform. Short read assembly is a critical part of most genome studies using NGS. Hence, in this study, the performance of nine well-known assemblers was evaluated in the assembly of seven different microbial genomes. Effect of different read coverage and k-mer parameters on the quality of the assembly were also evaluated on both simulated and actual read datasets. Our results show that the performance of assemblers on real and simulated datasets could be significantly different, mainly because of coverage bias. According to outputs on actual read datasets, for all studied read coverages (of 7×, 25× and 100×), SPAdes and IDBA-UD clearly outperformed other assemblers based on NGA50 and accuracy metrics. Velvet is the most conservative assembler with the lowest NGA50 and error rate.

收起

展开 

DOI：

10.1016/j.mimet.2017.09.008

被引量：

年份：

1970

全部来源

SCI-Hub (全网免费下载)

发表链接

ResearchGate (全网免费下载)

钛学术 (全网免费下载)

通过文献互助平台发起求助，成功后即可免费获取论文全文。

查看求助

求助方法1：

知识发现用户

每天可免费求助50篇

求助

求助方法1：

关注微信公众号

每天可免费求助2篇

求助方法2：

求助需要支付5个财富值

您现在财富值不足

您可以通过应助全文获取财富值

求助方法2：

完成求助需要支付5财富值

您目前有 1000 财富值

求助

我们已与文献出版商建立了直接购买合作。

你可以通过身份认证进行实名认证，认证成功后本次下载的费用将由您所在的图书馆支付

您可以直接购买此文献，1~5分钟即可下载全文，部分资源由于网络原因可能需要更长时间，请您耐心等待哦~

身份认证全文购买

相似文献(212)

参考文献(0)

引证文献(4)

Evaluation of nine popular de novo assemblers in microbial genome assembly.

Next generation sequencing (NGS) technologies are revolutionizing biology, with Illumina being the most popular NGS platform. Short read assembly is a critical part of most genome studies using NGS. Hence, in this study, the performance of nine well-known assemblers was evaluated in the assembly of seven different microbial genomes. Effect of different read coverage and k-mer parameters on the quality of the assembly were also evaluated on both simulated and actual read datasets. Our results show that the performance of assemblers on real and simulated datasets could be significantly different, mainly because of coverage bias. According to outputs on actual read datasets, for all studied read coverages (of 7×, 25× and 100×), SPAdes and IDBA-UD clearly outperformed other assemblers based on NGA50 and accuracy metrics. Velvet is the most conservative assembler with the lowest NGA50 and error rate.

Forouzan E ，Maleki MSM ，Karkhane AA ，Yakhchali B ... - 《-》

被引量: 4 发表:1970年
Practical evaluation of 11 de novo assemblers in metagenome assembly.

Next Generation Sequencing (NGS) technologies are revolutionizing the field of biology and metagenomic-based research. Since the volume of metagenomic data is typically very large, De novo metagenomic assembly can be effectively used to reduce the total amount of data and enhance quality of downstream analysis, such as annotation and binning. Although, there are many freely available assemblers, but selecting one suitable for a specific goal can be highly challenging. In this study, the performance of 11 well-known assemblers was evaluated in the assembly of three different metagenomes. The results obtained show that metaSPAdes is the best assembler and Megahit is a good choice for conservative assembly strategy. In addition, this research provides useful information regarding the pros and cons of each assembler and the effect of read length on assembly, thereby helping scholars to select the optimal assembler based on their objectives.

Forouzan E ，Shariati P ，Mousavi Maleki MS ，Karkhane AA ，Yakhchali B ... - 《-》

被引量: 19 发表:1970年
Benchmarking and Assessment of Eight De Novo Genome Assemblers on Viral Next-Generation Sequencing Data, Including the SARS-CoV-2.

Viral genomics has become crucial in clinical diagnostics and ecology, not to mention to stem the COVID-19 pandemic. Whole-genome sequencing (WGS) is pivotal in gaining an improved understanding of viral evolution, genomic epidemiology, infectious outbreaks, pathobiology, clinical management, and vaccine development. Genome assembly is one of the crucial steps in WGS data analyses. A series of different assemblers has been developed with the advent of high-throughput next-generation sequencing (NGS). Various studies have reported the evaluation of these assembly tools on distinct datasets; however, these lack data from viral origin. In this study, we performed a comparative evaluation and benchmarking of eight assemblers: SOAPdenovo, Velvet, assembly by short sequences (ABySS), iterative graph assembler (IDBA), SPAdes, Edena, iterative virus assembler, and VICUNA on the viral NGS data from distinct Illumina (GAIIx, Hiseq, Miseq, and Nextseq) platforms. WGS data of diverse viruses, that is, severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2), dengue virus 3, human immunodeficiency virus 1, hepatitis B virus, human herpesvirus 8, human papillomavirus 16, rhinovirus A, and West Nile virus, were utilized to assess these assemblers. Performance metrics such as genome fraction recovery, assembly lengths, NG50, N50, contig length, contig numbers, mismatches, and misassemblies were analyzed. Overall, three assemblers, that is, SPAdes, IDBA, and ABySS, performed consistently well, including for genome assembly of SARS-CoV-2. These assembly methods should be considered and recommended for future studies of viruses. The study also suggests that implementing two or more assembly approaches should be considered in viral NGS studies, especially in clinical settings. Taken together, the benchmarking of eight genome assemblers reported in this study can inform future public health and ecology research concerning the viruses, the COVID-19 pandemic, and viral outbreaks.

Gupta AK ，Kumar M 《-》

被引量: 2 发表:1970年
Identification of optimum sequencing depth especially for de novo genome assembly of small genomes using next generation sequencing data.

Desai A ，Marwah VS ，Yadav A ，Jha V ，Dhaygude K ，Bangar U ，Kulkarni V ，Jere A ... - 《PLoS One》

被引量: 51 发表:1970年
Evaluating long-read de novo assembly tools for eukaryotic genomes: insights and considerations.

Assembly algorithm choice should be a deliberate, well-justified decision when researchers create genome assemblies for eukaryotic organisms from third-generation sequencing technologies. While third-generation sequencing by Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PacBio) has overcome the disadvantages of short read lengths specific to next-generation sequencing (NGS), third-generation sequencers are known to produce more error-prone reads, thereby generating a new set of challenges for assembly algorithms and pipelines. However, the introduction of HiFi reads, which offer substantially reduced error rates, has provided a promising solution for more accurate assembly outcomes. Since the introduction of third-generation sequencing technologies, many tools have been developed that aim to take advantage of the longer reads, and researchers need to choose the correct assembler for their projects. We benchmarked state-of-the-art long-read de novo assemblers to help readers make a balanced choice for the assembly of eukaryotes. To this end, we used 12 real and 64 simulated datasets from different eukaryotic genomes, with different read length distributions, imitating PacBio continuous long-read (CLR), PacBio high-fidelity (HiFi), and ONT sequencing to evaluate the assemblers. We include 5 commonly used long-read assemblers in our benchmark: Canu, Flye, Miniasm, Raven, and wtdbg2 for ONT and PacBio CLR reads. For PacBio HiFi reads , we include 5 state-of-the-art HiFi assemblers: HiCanu, Flye, Hifiasm, LJA, and MBG. Evaluation categories address the following metrics: reference-based metrics, assembly statistics, misassembly count, BUSCO completeness, runtime, and RAM usage. Additionally, we investigated the effect of increased read length on the quality of the assemblies and report that read length can, but does not always, positively impact assembly quality. Our benchmark concludes that there is no assembler that performs the best in all the evaluation categories. However, our results show that overall Flye is the best-performing assembler for PacBio CLR and ONT reads, both on real and simulated data. Meanwhile, best-performing PacBio HiFi assemblers are Hifiasm and LJA. Next, the benchmarking using longer reads shows that the increased read length improves assembly quality, but the extent to which that can be achieved depends on the size and complexity of the reference genome.

Cosma BM ，Shirali Hossein Zade R ，Jordan EN ，van Lent P ，Peng C ，Pillay S ，Abeel T ... - 《GigaScience》

被引量: 2 发表:1970年

加载更多

来源期刊

影响因子：暂无数据

JCR分区：暂无

中科院分区：暂无