Mesh : Metagenome Metagenomics / methods Haplotypes Software Humans Genome, Bacterial Microbiota / genetics Bacteria / genetics classification High-Throughput Nucleotide Sequencing / methods Sequence Analysis, DNA / methods

来  源:   DOI:10.1093/bioinformatics/btae252   PDF(Pubmed)

Abstract:
CONCLUSIONS: Shotgun metagenomics allows for direct analysis of microbial community genetics, but scalable computational methods for the recovery of bacterial strain genomes from microbiomes remains a key challenge. We introduce Floria, a novel method designed for rapid and accurate recovery of strain haplotypes from short and long-read metagenome sequencing data, based on minimum error correction (MEC) read clustering and a strain-preserving network flow model. Floria can function as a standalone haplotyping method, outputting alleles and reads that co-occur on the same strain, as well as an end-to-end read-to-assembly pipeline (Floria-PL) for strain-level assembly. Benchmarking evaluations on synthetic metagenomes show that Floria is  > 3× faster and recovers 21% more strain content than base-level assembly methods (Strainberry) while being over an order of magnitude faster when only phasing is required. Applying Floria to a set of 109 deeply sequenced nanopore metagenomes took <20 min on average per sample and identified several species that have consistent strain heterogeneity. Applying Floria\'s short-read haplotyping to a longitudinal gut metagenomics dataset revealed a dynamic multi-strain Anaerostipes hadrus community with frequent strain loss and emergence events over 636 days. With Floria, accurate haplotyping of metagenomic datasets takes mere minutes on standard workstations, paving the way for extensive strain-level metagenomic analyses.
METHODS: Floria is available at https://github.com/bluenote-1577/floria, and the Floria-PL pipeline is available at https://github.com/jsgounot/Floria_analysis_workflow along with code for reproducing the benchmarks.
摘要:
结论:Shotgun宏基因组学允许直接分析微生物群落遗传学,但是从微生物组中恢复细菌菌株基因组的可扩展计算方法仍然是一个关键挑战。我们介绍弗洛里亚,一种新的方法,旨在快速,准确地从短期和长期阅读的宏基因组测序数据中恢复菌株单倍型,基于最小误差校正(MEC)读取聚类和应变保持网络流模型。Floria可以作为一种独立的单倍型方法,输出在同一菌株上共同出现的等位基因和读数,以及用于应变级装配的端到端读取到装配管道(Floria-PL)。对合成宏基因组的基准评估表明,Floria比基础水平的组装方法(Strainberry)快>3倍,并且恢复的菌株含量比基础水平的组装方法(Strainberry)多21%,而当仅需要定相时,则快了一个数量级。将Floria应用于一组109个深度测序的纳米孔宏基因组平均每个样品花费<20分钟,并且鉴定了具有一致的菌株异质性的几个物种。将Floria的短读单倍型分析应用于纵向肠道宏基因组学数据集,揭示了动态的多菌株厌氧菌群落,在636天内频繁发生菌株损失和出现事件。和Floria一起,在标准工作站上,宏基因组数据集的准确单倍型只需几分钟,为广泛的菌株水平宏基因组分析铺平了道路。
方法:Floria可在https://github.com/bluenote-1577/floria获得,Floria-PL管道可在https://github.com/jsgounot/Floria_analysis_workflow上找到,并提供用于复制基准的代码。
公众号