关键词: core genes database identification microbiome species

Mesh : RNA, Ribosomal, 16S / genetics Bacteria / genetics classification isolation & purification Microbiota Archaea / genetics classification Genome, Bacterial Phylogeny Databases, Genetic Genome, Archaeal Sequence Analysis, DNA Computational Biology / methods

来  源:   DOI:10.1099/ijsem.0.006421   PDF(Pubmed)

Abstract:
With the continued evolution of DNA sequencing technologies, the role of genome sequence data has become more integral in the classification and identification of Bacteria and Archaea. Six years after introducing EzBioCloud, an integrated platform representing the taxonomic hierarchy of Bacteria and Archaea through quality-controlled 16S rRNA gene and genome sequences, we present an updated version, that further refines and expands its capabilities. The current update recognizes the growing need for accurate taxonomic information as defining a species increasingly relies on genome sequence comparisons. We also incorporated an advanced strategy for addressing underrepresented or less studied lineages, bolstering the comprehensiveness and accuracy of our database. Our rigorous quality control protocols remain, where whole-genome assemblies from the NCBI Assembly Database undergo stringent screening to remove low-quality sequence data. These are then passed through our enhanced identification bioinformatics pipeline which initiates a 16S rRNA gene similarity search and then calculates the average nucleotide identity (ANI). For genome sequences lacking a 16S rRNA sequence and without a closely related genomic representative for ANI calculation, we apply a different ANI approach using bacterial core genes for improved taxonomic placement (core gene ANI, cgANI). Because of the increase in genome sequences available in NCBI and our newly introduced cgANI method, EzBioCloud now encompasses a total of 109 835 species, of which 21 964 have validly published names. 47 896 are candidate species identified either through 16S rRNA sequence similarity (phylotypes) or through whole genome ANI (genomospecies), and the remaining 39 975 were positioned in the taxonomic tree by cgANI (species clusters). Our EzBioCloud database is accessible at www.ezbiocloud.net/db.
摘要:
随着DNA测序技术的不断发展,基因组序列数据在细菌和古细菌的分类和鉴定中的作用已变得越来越重要。在引入EzBioCloud六年后,通过质量控制的16SrRNA基因和基因组序列,代表细菌和古菌的分类学层次结构的集成平台,我们提供了一个更新的版本,这进一步完善和扩展了其能力。当前的更新认识到对准确分类学信息的需求日益增长,因为定义物种越来越依赖于基因组序列比较。我们还采用了一种先进的策略来解决代表性不足或研究较少的血统,加强我们数据库的全面性和准确性。我们严格的质量控制协议仍然存在,其中来自NCBI装配数据库的全基因组装配体经过严格的筛选以去除低质量的序列数据。然后通过我们增强的鉴定生物信息学管道,该管道启动16SrRNA基因相似性搜索,然后计算平均核苷酸同一性(ANI)。对于缺乏16SrRNA序列且没有密切相关的ANI计算基因组代表的基因组序列,我们使用细菌核心基因应用不同的ANI方法来改进分类学位置(核心基因ANI,cgANI)。由于NCBI和我们新引入的cgANI方法中可用的基因组序列增加,EzBioCloud现在共有109835种,其中有21964个有效公布的名字。47896是通过16SrRNA序列相似性(系统型)或通过全基因组ANI(基因组)鉴定的候选物种,其余的39975通过cgANI(物种簇)定位在分类树中。我们的EzBioCloud数据库可以在www上访问。ezbiocould.net/db。
公众号