关键词: Tn‐seq cofitness network pangenome

来  源:   DOI:10.1002/mlf2.12132   PDF(Pubmed)

Abstract:
Most in silico evolutionary studies commonly assumed that core genes are essential for cellular function, while accessory genes are dispensable, particularly in nutrient-rich environments. However, this assumption is seldom tested genetically within the pangenome context. In this study, we conducted a robust pangenomic Tn-seq analysis of fitness genes in a nutrient-rich medium for Sinorhizobium strains with a canonical open pangenome. To evaluate the robustness of fitness category assignment, Tn-seq data for three independent mutant libraries per strain were analyzed by three methods, which indicates that the Hidden Markov Model (HMM)-based method is most robust to variations between mutant libraries and not sensitive to data size, outperforming the Bayesian and Monte Carlo simulation-based methods. Consequently, the HMM method was used to classify the fitness category. Fitness genes, categorized as essential (ES), advantage (GA), and disadvantage (GD) genes for growth, are enriched in core genes, while nonessential genes (NE) are over-represented in accessory genes. Accessory ES/GA genes showed a lower fitness effect than core ES/GA genes. Connectivity degrees in the cofitness network decrease in the order of ES, GD, and GA/NE. In addition to accessory genes, 1599 out of 3284 core genes display differential essentiality across test strains. Within the pangenome core, both shared quasi-essential (ES and GA) and strain-dependent fitness genes are enriched in similar functional categories. Our analysis demonstrates a considerable fuzzy essential zone determined by cofitness connectivity degrees in Sinorhizobium pangenome and highlights the power of the cofitness network in understanding the genetic basis of ever-increasing prokaryotic pangenome data.
摘要:
大多数计算机模拟进化研究通常认为核心基因对细胞功能至关重要,虽然附属基因是可有可无的,特别是在营养丰富的环境中。然而,这种假设很少在pangenome背景下进行基因测试。在这项研究中,我们在营养丰富的培养基中对具有典型开放pangenome的中华根瘤菌菌株进行了适应性基因的全基因组Tn-seq分析。为了评估适应度类别分配的鲁棒性,通过三种方法分析了每个菌株三个独立突变文库的Tn-seq数据,这表明基于隐马尔可夫模型(HMM)的方法对突变库之间的变化最健壮,对数据大小不敏感,优于基于贝叶斯和蒙特卡罗模拟的方法。因此,使用HMM方法对适应度类别进行分类。健身基因,归类为必需品(ES),优势(GA),和生长的劣势(GD)基因,富含核心基因,而非必需基因(NE)在辅助基因中过度代表。辅助ES/GA基因显示出比核心ES/GA基因更低的适应度效应。共适应网络中的连通性程度按ES的顺序降低,GD,GA/NE。除了辅助基因,3284个核心基因中的1599个在测试菌株中显示出差异的重要性。在pangenome核心内,共享的准必需基因(ES和GA)和菌株依赖性适应度基因都富集在相似的功能类别中。我们的分析表明,中华根瘤菌中的共适应度连通性程度确定了相当大的模糊基本区域,并强调了共适应度网络在理解不断增加的原核全基因组数据的遗传基础方面的力量。
公众号