Mesh : Metagenomics / methods Benchmarking Bacteria / genetics classification Nanopore Sequencing Nanopores Microbiota

来  源:   DOI:10.1038/s41597-024-03672-8   PDF(Pubmed)

Abstract:
Taxonomic classification is crucial in identifying organisms within diverse microbial communities when using metagenomics shotgun sequencing. While second-generation Illumina sequencing still dominates, third-generation nanopore sequencing promises improved classification through longer reads. However, extensive benchmarking studies on nanopore data are lacking. We systematically evaluated performance of bacterial taxonomic classification for metagenomics nanopore sequencing data for several commonly used classifiers, using standardized reference sequence databases, on the largest collection of publicly available data for defined mock communities thus far (nine samples), representing different research domains and application scopes. Our results categorize classifiers into three categories: low precision/high recall; medium precision/medium recall, and high precision/medium recall. Most fall into the first group, although precision can be improved without excessively penalizing recall with suitable abundance filtering. No definitive \'best\' classifier emerges, and classifier selection depends on application scope and practical requirements. Although few classifiers designed for long reads exist, they generally exhibit better performance. Our comprehensive benchmarking provides concrete recommendations, supported by publicly available code for reassessment and fine-tuning by other scientists.
摘要:
当使用宏基因组学鸟枪测序时,分类分类对于识别不同微生物群落中的生物至关重要。虽然第二代Illumina测序仍然占主导地位,第三代纳米孔测序有望通过更长的读数改善分类。然而,缺乏对纳米孔数据的广泛基准研究。我们系统地评估了几种常用分类器的宏基因组学纳米孔测序数据的细菌分类学分类性能,使用标准化的参考序列数据库,关于迄今为止定义的模拟社区最大的公开数据收集(九个样本),代表不同的研究领域和应用范围。我们的结果将分类器分为三类:低精度/高召回率;中等精度/中等召回率,和高精度/中等召回率。大多数属于第一组,尽管通过适当的丰度过滤可以在不过度惩罚召回的情况下提高精度。没有明确的“最佳”分类器出现,分类器的选择取决于应用范围和实际需求。尽管很少有为长读取设计的分类器,它们通常表现出更好的性能。我们全面的基准提供了具体的建议,由其他科学家重新评估和微调的公开可用代码支持。
公众号