关键词: DNA mock metagenome assembly long-read sequencing sequencer benchmark software benchmark

Mesh : Metagenome Benchmarking High-Throughput Nucleotide Sequencing / methods Metagenomics / methods Bacteria / genetics classification isolation & purification Sequence Analysis, DNA / methods Genome, Bacterial / genetics Microbiota / genetics

来  源:   DOI:10.1099/mic.0.001469   PDF(Pubmed)

Abstract:
Metagenome community analyses, driven by the continued development in sequencing technology, is rapidly providing insights in many aspects of microbiology and becoming a cornerstone tool. Illumina, Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PacBio) are the leading technologies, each with their own advantages and drawbacks. Illumina provides accurate reads at a low cost, but their length is too short to close bacterial genomes. Long reads overcome this limitation, but these technologies produce reads with lower accuracy (ONT) or with lower throughput (PacBio high-fidelity reads). In a critical first analysis step, reads are assembled to reconstruct genomes or individual genes within the community. However, to date, the performance of existing assemblers has never been challenged with a complex mock metagenome. Here, we evaluate the performance of current assemblers that use short, long or both read types on a complex mock metagenome consisting of 227 bacterial strains with varying degrees of relatedness. We show that many of the current assemblers are not suited to handle such a complex metagenome. In addition, hybrid assemblies do not fulfil their potential. We conclude that ONT reads assembled with CANU and Illumina reads assembled with SPAdes offer the best value for reconstructing genomes and individual genes of complex metagenomes, respectively.
摘要:
宏基因组群落分析,在测序技术持续发展的推动下,正在迅速提供微生物学许多方面的见解,并成为基石工具。Illumina,牛津纳米孔技术(ONT)和太平洋生物科学(PacBio)是领先的技术,每个人都有自己的优点和缺点。Illumina以低成本提供准确的读数,但是它们的长度太短,无法关闭细菌基因组。长读克服了这个限制,但这些技术产生的读取精度较低(ONT)或吞吐量较低(PacBio高保真读取)。在关键的第一步分析中,读段被组装以重建群落内的基因组或单个基因。然而,到目前为止,现有组装器的性能从未受到复杂模拟宏基因组的挑战。这里,我们评估当前使用short,在由227个具有不同程度相关性的细菌菌株组成的复杂模拟宏基因组上,长或两者都是阅读类型。我们表明,许多当前的组装者不适合处理这种复杂的宏基因组。此外,混合组件不能发挥其潜力。我们得出的结论是,用CANU组装的ONT读段和用SPAdes组装的Illumina读段为重建复杂宏基因组的基因组和个体基因提供了最佳价值。分别。
公众号