关键词: genome analysis microbiome synteny

来  源:   DOI:10.1128/msystems.00497-24

Abstract:
Relationships between bacterial taxa are traditionally defined using 16S rRNA nucleotide similarity or average nucleotide identity. Improvements in sequencing technology provide additional pairwise information on genome sequences, which may provide valuable information on genomic relationships. Mapping orthologous gene locations between genome pairs, known as synteny, is typically implemented in the discovery of new species and has not been systematically applied to bacterial genomes. Using a data set of 378 bacterial genomes, we developed and tested a new measure of synteny similarity between a pair of genomes, which was scaled onto 16S rRNA distance using covariance matrices. Based on the input gene functions used (i.e., core, antibiotic resistance, and virulence), we observed varying topological arrangements of bacterial relationship networks by applying (i) complete linkage hierarchical clustering and (ii) K-nearest neighbor graph structures to synteny-scaled 16S data. Our metric improved clustering quality comparatively to state-of-the-art average nucleotide identity metrics while preserving clustering assignments for the highest similarity relationships. Our findings indicate that syntenic relationships provide more granular and interpretable relationships for within-genera taxa compared to pairwise similarity measures, particularly in functional contexts.
OBJECTIVE: Given the prevalence and necessity of the 16S rRNA measure in bacterial identification and analysis, this additional analysis adds a functional and synteny-based layer to the identification of relatives and clustering of bacteria genomes. It is also of computational interest to model the bacterial genome as a graph structure, which presents new avenues of genomic analysis for bacteria and their closely related strains and species.
摘要:
传统上使用16SrRNA核苷酸相似性或平均核苷酸同一性来定义细菌分类群之间的关系。测序技术的改进提供了有关基因组序列的额外成对信息,可以提供有关基因组关系的有价值的信息。绘制基因组对之间的直系同源基因位置,被称为同理,通常在发现新物种时实施,尚未系统地应用于细菌基因组。使用378个细菌基因组的数据集,我们开发并测试了一对基因组之间的同伦相似性的新度量,使用协方差矩阵将其缩放为16SrRNA距离。基于使用的输入基因功能(即,核心,抗生素耐药性,和毒力),我们通过将(i)完全链接层次聚类和(ii)K最近邻图结构应用于同步缩放的16S数据,观察到细菌关系网络的拓扑排列变化。与最先进的平均核苷酸同一性度量相比,我们的度量标准提高了聚类质量,同时保留了最高相似性关系的聚类分配。我们的发现表明,与成对相似性度量相比,同伦关系为属内分类单元提供了更精细和可解释的关系,特别是在功能环境中。
目的:鉴于16SrRNA在细菌鉴定和分析中的普遍性和必要性,这项额外的分析为细菌基因组的亲属鉴定和聚类增加了一个基于功能和同义的层.将细菌基因组建模为图形结构也具有计算意义,这为细菌及其密切相关的菌株和物种的基因组分析提供了新的途径。
公众号