关键词: Amplicon sequence variants Gap analysis Interreg Eco-AlpsWater Metabarcoding Taxonomic coverage Taxonomic databases Amplicon sequence variants Gap analysis Interreg Eco-AlpsWater Metabarcoding Taxonomic coverage Taxonomic databases

Mesh : Base Sequence Cyanobacteria / genetics Eukaryota European Alpine Region Genetic Markers Microalgae / genetics Phylogeny RNA, Ribosomal, 16S / genetics RNA, Ribosomal, 18S

来  源:   DOI:10.1016/j.scitotenv.2022.155175

Abstract:
The taxonomic identification of organisms based on the amplification of specific genetic markers (metabarcoding) implicitly requires adequate discriminatory information and taxonomic coverage of environmental DNA sequences in taxonomic databases. These requirements were quantitatively examined by comparing the determination of cyanobacteria and microalgae obtained by metabarcoding and light microscopy. We used planktic and biofilm samples collected in 37 lakes and 22 rivers across the Alpine region. We focused on two of the most used and best represented genetic markers in the reference databases, namely the 16S rRNA and 18S rRNA genes. A sequence gap analysis using blastn showed that, in the identity range of 99-100%, approximately 30% (plankton) and 60% (biofilm) of the sequences did not find any close counterpart in the reference databases (NCBI GenBank). Similarly, a taxonomic gap analysis showed that approximately 50% of the cyanobacterial and eukaryotic microalgal species identified by light microscopy were not represented in the reference databases. In both cases, the magnitude of the gaps differed between the major taxonomic groups. Even considering the species determined under the microscope and represented in the reference databases, 22% and 26% were still not included in the results obtained by the blastn at percentage levels of identity ≥95% and ≥97%, respectively. The main causes were the absence of matching sequences due to amplification and/or sequencing failure and potential misidentification in the microscopy step. Our results quantitatively demonstrated that in metabarcoding the main obstacles in the classification of 16S rRNA and 18S rRNA sequences and interpretation of high-throughput sequencing biomonitoring data were due to the existence of important gaps in the taxonomic completeness of the reference databases and the short length of reads. The study focused on the Alpine region, but the extent of the gaps could be much greater in other less investigated geographic areas.
摘要:
基于特定遗传标记(metabarcoding)的扩增对生物的分类学鉴定隐含地需要分类学数据库中环境DNA序列的足够的歧视性信息和分类学覆盖。通过比较通过代谢编码和光学显微镜获得的蓝细菌和微藻的测定,定量检查了这些要求。我们使用了在高山地区的37个湖泊和22条河流中收集的浮游和生物膜样品。我们专注于参考数据库中最常用和最具代表性的两个遗传标记,即16SrRNA和18SrRNA基因。使用Blastn的序列间隙分析表明,在99-100%的同一性范围内,大约30%(浮游生物)和60%(生物膜)的序列在参考数据库(NCBIGenBank)中没有找到任何紧密的对应物。同样,分类学差异分析显示,通过光学显微镜鉴定的大约50%的蓝藻和真核微藻物种没有出现在参考数据库中。在这两种情况下,主要分类群体之间的差距大小不同。即使考虑到在显微镜下确定并在参考数据库中表示的物种,22%和26%仍未包括在身份≥95%和≥97%的百分比水平的blastn获得的结果中,分别。主要原因是由于扩增和/或测序失败以及显微镜步骤中的潜在错误鉴定而导致的匹配序列的缺乏。我们的结果定量地表明,在16SrRNA和18SrRNA序列的分类和高通量测序生物监测数据的解释中,在元编码中的主要障碍是由于参考数据库的分类完整性和短的长度存在重要的缺口。这项研究集中在阿尔卑斯山地区,但是在其他调查较少的地理区域,差距的程度可能更大。
公众号