DNA mock metagenome

  • 文章类型: Journal Article
    宏基因组群落分析,在测序技术持续发展的推动下,正在迅速提供微生物学许多方面的见解,并成为基石工具。Illumina,牛津纳米孔技术(ONT)和太平洋生物科学(PacBio)是领先的技术,每个人都有自己的优点和缺点。Illumina以低成本提供准确的读数,但是它们的长度太短,无法关闭细菌基因组。长读克服了这个限制,但这些技术产生的读取精度较低(ONT)或吞吐量较低(PacBio高保真读取)。在关键的第一步分析中,读段被组装以重建群落内的基因组或单个基因。然而,到目前为止,现有组装器的性能从未受到复杂模拟宏基因组的挑战。这里,我们评估当前使用short,在由227个具有不同程度相关性的细菌菌株组成的复杂模拟宏基因组上,长或两者都是阅读类型。我们表明,许多当前的组装者不适合处理这种复杂的宏基因组。此外,混合组件不能发挥其潜力。我们得出的结论是,用CANU组装的ONT读段和用SPAdes组装的Illumina读段为重建复杂宏基因组的基因组和个体基因提供了最佳价值。分别。
    Metagenome community analyses, driven by the continued development in sequencing technology, is rapidly providing insights in many aspects of microbiology and becoming a cornerstone tool. Illumina, Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PacBio) are the leading technologies, each with their own advantages and drawbacks. Illumina provides accurate reads at a low cost, but their length is too short to close bacterial genomes. Long reads overcome this limitation, but these technologies produce reads with lower accuracy (ONT) or with lower throughput (PacBio high-fidelity reads). In a critical first analysis step, reads are assembled to reconstruct genomes or individual genes within the community. However, to date, the performance of existing assemblers has never been challenged with a complex mock metagenome. Here, we evaluate the performance of current assemblers that use short, long or both read types on a complex mock metagenome consisting of 227 bacterial strains with varying degrees of relatedness. We show that many of the current assemblers are not suited to handle such a complex metagenome. In addition, hybrid assemblies do not fulfil their potential. We conclude that ONT reads assembled with CANU and Illumina reads assembled with SPAdes offer the best value for reconstructing genomes and individual genes of complex metagenomes, respectively.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:尽管地球上微生物类群的总数仍在争论中,很明显,只有一小部分的这些已被培育和有效命名。显然,无法在非常特殊的条件之外培养大多数细菌严重限制了它们的表征和进一步的研究。在过去的十年里,解决这个问题的主要部分是使用宏基因组测序,对整个微生物群落的DNA进行测序,随后对其新组成物种的基因组进行了计算机模拟重建。测序型菌株基因组的数量(约12,000)和总微生物多样性(106-1012种)之间的巨大差异指导这些努力从头组装和分箱。不幸的是,这些步骤容易出错,因此,必须严格审查结果,以避免发布不完整和低质量的基因组。
    结果:我们开发了MAGISTA(宏基因组组装的基因组箱内统计评估),一种评估宏基因组组装基因组质量的新方法,解决了当前基于参考基因的方法经常被忽视的一些缺点。MAGISTA基于宏基因组箱内重叠群片段之间的无比对距离分布,而不是一组参考基因。为了适当的培训,需要一个高度复杂的基因组DNA模拟群落,并通过汇集227个细菌菌株的基因组DNA来构建,专门选择以获得代表可培养细菌的主要系统发育谱系的各种品种。
    结论:MAGISTA在公开可用的模拟宏基因组上进行测试时,与标记基因方法相比,均方根误差降低了20%。此外,我们高度复杂的基因组DNA模拟社区是基准(新)宏基因组分析方法的非常有价值的工具。
    BACKGROUND: Although the total number of microbial taxa on Earth is under debate, it is clear that only a small fraction of these has been cultivated and validly named. Evidently, the inability to culture most bacteria outside of very specific conditions severely limits their characterization and further studies. In the last decade, a major part of the solution to this problem has been the use of metagenome sequencing, whereby the DNA of an entire microbial community is sequenced, followed by the in silico reconstruction of genomes of its novel component species. The large discrepancy between the number of sequenced type strain genomes (around 12,000) and total microbial diversity (106-1012 species) directs these efforts to de novo assembly and binning. Unfortunately, these steps are error-prone and as such, the results have to be intensely scrutinized to avoid publishing incomplete and low-quality genomes.
    RESULTS: We developed MAGISTA (metagenome-assembled genome intra-bin statistics assessment), a novel approach to assess metagenome-assembled genome quality that tackles some of the often-neglected drawbacks of current reference gene-based methods. MAGISTA is based on alignment-free distance distributions between contig fragments within metagenomic bins, rather than a set of reference genes. For proper training, a highly complex genomic DNA mock community was needed and constructed by pooling genomic DNA of 227 bacterial strains, specifically selected to obtain a wide variety representing the major phylogenetic lineages of cultivable bacteria.
    CONCLUSIONS: MAGISTA achieved a 20% reduction in root-mean-square error in comparison to the marker gene approach when tested on publicly available mock metagenomes. Furthermore, our highly complex genomic DNA mock community is a very valuable tool for benchmarking (new) metagenome analysis methods.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

公众号