-
Draft genome assemblies using sequencing reads from Oxford Nanopore Technology and Illumina platforms for four species of North American Fundulus killifish.
Whole-genome sequencing data from wild-caught individuals of closely related North American killifish species (Fundulus xenicus, Fundulus catenatus, Fundulus nottii, and Fundulus olivaceus) were obtained using long-read Oxford Nanopore Technology (ONT) PromethION and short-read Illumina platforms.
Draft de novo reference genome assemblies were generated using a combination of long and short sequencing reads. For each species, the PromethION platform was used to generate 30-45× sequence coverage, and the Illumina platform was used to generate 50-160× sequence coverage. Illumina-only assemblies were fragmented with high numbers of contigs, while ONT-only assemblies were error prone with low BUSCO scores. The highest N50 values, ranging from 0.4 to 2.7 Mb, were from assemblies generated using a combination of short- and long-read data. BUSCO scores were consistently >90% complete using the Eukaryota database.
High-quality genomes can be obtained from a combination of using short-read Illumina data to polish assemblies generated with long-read ONT data. Draft assemblies and raw sequencing data are available for public use. We encourage use and reuse of these data for assembly benchmarking and other analyses.
Johnson LK
,Sahasrabudhe R
,Gill JA
,Roach JL
,Froenicke L
,Brown CT
,Whitehead A
... -
《GigaScience》
-
Polishing the Oxford Nanopore long-read assemblies of bacterial pathogens with Illumina short reads to improve genomic analyses.
Oxford Nanopore sequencing has been widely used to achieve complete genomes of bacterial pathogens. However, the error rates of Oxford Nanopore long reads are high. Various polishing algorithms using Illumina short reads to correct the errors in Oxford Nanopore long-read assemblies have been developed. The impact of polishing the Oxford Nanopore long-read assemblies of bacterial pathogens with Illumina short reads on improving genomic analyses was evaluated using both simulated and real reads. Ten species (10 strains) were selected for simulated reads, while real reads were tested on 11 species (11 strains). Oxford Nanopore long reads were assembled with Unicycler to produce a draft assembly, followed by three rounds of polishing with Illumina short reads using two polishing tools, Pilon and NextPolish. One round of NextPolish polishing generated genome completeness and accuracy parameters similar to the reference genomes, whereas two or three rounds of Pilon polishing were needed, though contiguity remained unchanged after polishing. The polished assemblies of Escherichia coli O157:H7, Salmonella Typhimurium, and Cronobacter sakazakii with simulated reads did not provide accurate plasmid identifications. One round of NextPolish polishing was needed for accurately identifying plasmids in Staphylococcus aureus and E. coli O26:H11 with real reads, whereas one and two rounds of Pilon polishing were necessary for these two strains, respectively. Polishing failed to provide an accurate antimicrobial resistance (AMR) genotype for S. aureus with real reads. One round of polishing recovered an accurate AMR genotype for Klebsiella pneumoniae with real reads. The reference genome and draft assembly of Citrobacter braakii with real reads differed, which carried blaCMY-83 and fosA6, respectively, while both genes were present after one round of polishing. However, polishing did not improve the assembly of E. coli O26:H11 with real reads to achieve numbers of virulence genes similar to the reference genome. The draft and polished assemblies showed a phylogenetic tree topology comparable with the reference genomes. For multilocus sequence typing and pan-genome analyses, one round of NextPolish polishing was sufficient to obtain accurate results, while two or three rounds of Pilon polishing were needed. Overall, NextPolish outperformed Pilon for polishing the Oxford Nanopore long-read assemblies of bacterial pathogens, though both polishing strategies improved genomic analyses compared to the draft assemblies.
Chen Z
,Erickson DL
,Meng J
《-》
-
Benchmarking Long-Read Assemblers for Genomic Analyses of Bacterial Pathogens Using Oxford Nanopore Sequencing.
Oxford Nanopore sequencing can be used to achieve complete bacterial genomes. However, the error rates of Oxford Nanopore long reads are greater compared to Illumina short reads. Long-read assemblers using a variety of assembly algorithms have been developed to overcome this deficiency, which have not been benchmarked for genomic analyses of bacterial pathogens using Oxford Nanopore long reads. In this study, long-read assemblers, namely Canu, Flye, Miniasm/Racon, Raven, Redbean, and Shasta, were thus benchmarked using Oxford Nanopore long reads of bacterial pathogens. Ten species were tested for mediocre- and low-quality simulated reads, and 10 species were tested for real reads. Raven was the most robust assembler, obtaining complete and accurate genomes. All Miniasm/Racon and Raven assemblies of mediocre-quality reads provided accurate antimicrobial resistance (AMR) profiles, while the Raven assembly of with low-quality reads was the only assembly with an accurate AMR profile among all assemblers and species. All assemblers functioned well for predicting virulence genes using mediocre-quality and real reads, whereas only the Raven assemblies of low-quality reads had accurate numbers of virulence genes. Regarding multilocus sequence typing (MLST), Miniasm/Racon was the most effective assembler for mediocre-quality reads, while only the Raven assemblies of O157:H7 and with low-quality reads showed positive MLST results. Miniasm/Racon and Raven were the best performers for MLST using real reads. The Miniasm/Racon and Raven assemblies showed accurate phylogenetic inference. For the pan-genome analyses, Raven was the strongest assembler for simulated reads, whereas Miniasm/Racon and Raven performed the best for real reads. Overall, the most robust and accurate assembler was Raven, closely followed by Miniasm/Racon.
Chen Z
,Erickson DL
,Meng J
《INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES》
-
De novo assembly of the Indian blue peacock (Pavo cristatus) genome using Oxford Nanopore technology and Illumina sequencing.
The Indian peafowl (Pavo cristanus) is native to South Asia and is the national bird of India. Here we present a draft genome sequence of the male blue peacock using Illumina and Oxford Nanopore technology (ONT).
ONT sequencing gave ∼2.3-fold sequencing coverage, whereas Illumina generated 150-base pair paired-end sequence data at 284.6-fold coverage from 5 libraries. Subsequently, we generated a 0.915-gigabase pair de novo assembly of the peacock genome with a scaffold N50 of 0.23 megabase pairs (Mb). We predict that the peacock genome contains 23,153 protein-coding genes and 75.3 Mb (7.33%) of repetitive sequences.
We report a high-quality assembly of the peacock genome using a hybrid approach of sequences generated by both Illumina and ONT. The long-read chemistry generated by ONT was useful for addressing challenges related to de novo assembly, particularly at regions containing repetitive sequences spanning longer than the read length, and which could not be resolved with only short-read-based assembly. Contig assembly of Illumina short reads gave an N50 of 1,639 bases, whereas with ONT, the N50 increased by >9-fold to 14,749 bases. The initial contig assembly based on Illumina sequencing reads alone gave 685,241 contigs. Further scaffolding on assembled contigs using both Illumina and ONT sequencing reads resulted in a final assembly of 15,025 super-scaffolds, with an N50 of ∼0.23 Mb. Ninety-five percent of proteins predicted by homology matched with those in a public repository, verifying the completeness of our assembly. Like other phylogenetic studies of avian conserved genes, we found P. cristatus to be most closely related to Gallus gallus, followed by Meleagris gallopavo and Anas platyrhynchos. Compared with the recently published peacock genome assembly, the current, superior, hybrid assembly has greater sequencing depth, fewer non-ATGC sequences, and fewer scaffolds.
Dhar R
,Seethy A
,Pethusamy K
,Singh S
,Rohil V
,Purkayastha K
,Mukherjee I
,Goswami S
,Singh R
,Raj A
,Srivastava T
,Acharya S
,Rajashekhar B
,Karmakar S
... -
《GigaScience》
-
Are we there yet? Benchmarking low-coverage nanopore long-read sequencing for the assembling of mitochondrial genomes using the vulnerable silky shark Carcharhinus falciformis.
Whole mitochondrial genomes are quickly becoming markers of choice for the exploration of within-species genealogical and among-species phylogenetic relationships. Most often, 'primer walking' or 'long PCR' strategies plus Sanger sequencing or low-pass whole genome sequencing using Illumina short reads are used for the assembling of mitochondrial chromosomes. In this study, we first confirmed that mitochondrial genomes can be sequenced from long reads using nanopore sequencing data exclusively. Next, we examined the accuracy of the long-reads assembled mitochondrial chromosomes when comparing them to a 'gold' standard reference mitochondrial chromosome assembled using Illumina short-reads sequencing.
Using a specialized bioinformatics tool, we first produced a short-reads mitochondrial genome assembly for the silky shark C. falciformis with an average base coverage of 9.8x. The complete mitochondrial genome of C. falciformis was 16,705 bp in length and 934 bp shorter than a previously assembled genome (17,639 bp in length) that used bioinformatics tools not specialized for the assembly of mitochondrial chromosomes. Next, low-pass whole genome sequencing using a MinION ONT pocket-sized platform plus customized de-novo and reference-based workflows assembled and circularized a highly accurate mitochondrial genome in the silky shark Carcharhinus falciformis. Indels at the flanks of homopolymer regions explained most of the dissimilarities observed between the 'gold' standard reference mitochondrial genome (assembled using Illumina short reads) and each of the long-reads mitochondrial genome assemblies. Although not completely accurate, mitophylogenomics and barcoding analyses (using entire mitogenomes and the D-Loop/Control Region, respectively) suggest that long-reads assembled mitochondrial genomes are reliable for identifying a sequenced individual, such as C. falciformis, and separating the same individual from others belonging to closely related congeneric species.
This study confirms that mitochondrial genomes can be sequenced from long-reads nanopore sequencing data exclusively. With further development, nanopore technology can be used to quickly test in situ mislabeling in the shark fin fishing industry and thus, improve surveillance protocols, law enforcement, and the regulation of this fishery. This study will also assist with the transferring of high-throughput sequencing technology to middle- and low-income countries so that international scientists can explore population genomics in sharks using inclusive research strategies. Lastly, we recommend assembling mitochondrial genomes using specialized assemblers instead of other assemblers developed for bacterial and/or nuclear genomes.
Baeza JA
,García-De León FJ
《BMC GENOMICS》