pangenome

Pangenome
  • 文章类型: Journal Article
    大多数计算机模拟进化研究通常认为核心基因对细胞功能至关重要,虽然附属基因是可有可无的,特别是在营养丰富的环境中。然而,这种假设很少在pangenome背景下进行基因测试。在这项研究中,我们在营养丰富的培养基中对具有典型开放pangenome的中华根瘤菌菌株进行了适应性基因的全基因组Tn-seq分析。为了评估适应度类别分配的鲁棒性,通过三种方法分析了每个菌株三个独立突变文库的Tn-seq数据,这表明基于隐马尔可夫模型(HMM)的方法对突变库之间的变化最健壮,对数据大小不敏感,优于基于贝叶斯和蒙特卡罗模拟的方法。因此,使用HMM方法对适应度类别进行分类。健身基因,归类为必需品(ES),优势(GA),和生长的劣势(GD)基因,富含核心基因,而非必需基因(NE)在辅助基因中过度代表。辅助ES/GA基因显示出比核心ES/GA基因更低的适应度效应。共适应网络中的连通性程度按ES的顺序降低,GD,GA/NE。除了辅助基因,3284个核心基因中的1599个在测试菌株中显示出差异的重要性。在pangenome核心内,共享的准必需基因(ES和GA)和菌株依赖性适应度基因都富集在相似的功能类别中。我们的分析表明,中华根瘤菌中的共适应度连通性程度确定了相当大的模糊基本区域,并强调了共适应度网络在理解不断增加的原核全基因组数据的遗传基础方面的力量。
    Most in silico evolutionary studies commonly assumed that core genes are essential for cellular function, while accessory genes are dispensable, particularly in nutrient-rich environments. However, this assumption is seldom tested genetically within the pangenome context. In this study, we conducted a robust pangenomic Tn-seq analysis of fitness genes in a nutrient-rich medium for Sinorhizobium strains with a canonical open pangenome. To evaluate the robustness of fitness category assignment, Tn-seq data for three independent mutant libraries per strain were analyzed by three methods, which indicates that the Hidden Markov Model (HMM)-based method is most robust to variations between mutant libraries and not sensitive to data size, outperforming the Bayesian and Monte Carlo simulation-based methods. Consequently, the HMM method was used to classify the fitness category. Fitness genes, categorized as essential (ES), advantage (GA), and disadvantage (GD) genes for growth, are enriched in core genes, while nonessential genes (NE) are over-represented in accessory genes. Accessory ES/GA genes showed a lower fitness effect than core ES/GA genes. Connectivity degrees in the cofitness network decrease in the order of ES, GD, and GA/NE. In addition to accessory genes, 1599 out of 3284 core genes display differential essentiality across test strains. Within the pangenome core, both shared quasi-essential (ES and GA) and strain-dependent fitness genes are enriched in similar functional categories. Our analysis demonstrates a considerable fuzzy essential zone determined by cofitness connectivity degrees in Sinorhizobium pangenome and highlights the power of the cofitness network in understanding the genetic basis of ever-increasing prokaryotic pangenome data.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    茶,全球消费最广泛的饮料之一,在其潜在的风味和与健康相关的化合物中表现出显著的基因组多样性。在这项研究中,我们介绍了一个由11个基因组组成的茶基因组的构建和分析,重点研究了三个新测序的基因组,包括紫叶assamica品种“子娟”,对温度敏感的中华品种“Anjibaicha”和野生登录号“L618”,其组合物表现出优异的质量分数,因为它们从最新的测序技术中获利。我们的分析包括对整个茶基因组的转座子补体的详细调查,揭示所研究基因组中转座子分布的共享模式,并通过长读技术提高转座子分辨率,如长终端重复(LTR)装配指数分析所示。此外,我们的研究包括以基因为中心的pangenome探索,通过我们的研究探索儿茶素途径的基因组景观,提供有关拷贝数改变和以基因为中心的变异的见解,尤其是花青素合成酶。我们通过使用相同的管道在结构和功能上注释所有可用的基因组,构建了以基因为中心的pangenome,这既增加了基因完整性,又允许高功能注释率。这种改进和一致注释的基因集将允许茶基因组之间的更好比较。我们使用这种改进的pangenome来捕获核心和可有可无的基因库,阐明茶树中存在的功能多样性。这种pangenome资源可能作为一个有价值的资源,用于理解特征的基本遗传基础,如风味,应力耐受性,和抗病性,对茶叶育种计划有影响。
    Tea, one of the most widely consumed beverages globally, exhibits remarkable genomic diversity in its underlying flavour and health-related compounds. In this study, we present the construction and analysis of a tea pangenome comprising a total of 11 genomes, with a focus on three newly sequenced genomes comprising the purple-leaved assamica cultivar \"Zijuan\", the temperature-sensitive sinensis cultivar \"Anjibaicha\" and the wild accession \"L618\" whose assemblies exhibited excellent quality scores as they profited from latest sequencing technologies. Our analysis incorporates a detailed investigation of transposon complement across the tea pangenome, revealing shared patterns of transposon distribution among the studied genomes and improved transposon resolution with long read technologies, as shown by long terminal repeat (LTR) Assembly Index analysis. Furthermore, our study encompasses a gene-centric exploration of the pangenome, exploring the genomic landscape of the catechin pathway with our study, providing insights on copy number alterations and gene-centric variants, especially for Anthocyanidin synthases. We constructed a gene-centric pangenome by structurally and functionally annotating all available genomes using an identical pipeline, which both increased gene completeness and allowed for a high functional annotation rate. This improved and consistently annotated gene set will allow for a better comparison between tea genomes. We used this improved pangenome to capture the core and dispensable gene repertoire, elucidating the functional diversity present within the tea species. This pangenome resource might serve as a valuable resource for understanding the fundamental genetic basis of traits such as flavour, stress tolerance, and disease resistance, with implications for tea breeding programmes.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    具有可变功能能力的微生物菌株在微生物群中共存。当前的菌株分析的生物信息学方法无法从宏基因组数据中提供菌株组成与其基因含量之间的直接联系。在这里,我们介绍了菌株水平的Pangenome分解分析(StrainPanDA),一种新的方法,利用多个宏基因组样品的pangenome覆盖谱,同时重建微生物群落中共存菌株的组成和基因含量变异。我们使用合成数据集系统地验证了StrainPanDA的准确性和鲁棒性。为了证明以基因为中心的菌株谱分析的力量,然后我们应用StrainPanDA分析婴儿的肠道微生物组样本,以及接受粪便微生物移植治疗的患者。我们表明,菌株组成和基因含量谱的连锁重建对于理解微生物适应与菌株特异性功能之间的关系至关重要(例如,营养素利用率和致病性)。最后,StrainPanDA对计算资源的要求最低,可以扩展为并行处理社区中的多个物种。总之,StrainPanDA可以应用于宏基因组数据集以检测分子功能与微生物/宿主表型之间的关联,以制定可测试的假设并在菌株或亚种水平获得新的生物学见解。
    Microbial strains of variable functional capacities coexist in microbiomes. Current bioinformatics methods of strain analysis cannot provide the direct linkage between strain composition and their gene contents from metagenomic data. Here we present Strain-level Pangenome Decomposition Analysis (StrainPanDA), a novel method that uses the pangenome coverage profile of multiple metagenomic samples to simultaneously reconstruct the composition and gene content variation of coexisting strains in microbial communities. We systematically validate the accuracy and robustness of StrainPanDA using synthetic data sets. To demonstrate the power of gene-centric strain profiling, we then apply StrainPanDA to analyze the gut microbiome samples of infants, as well as patients treated with fecal microbiota transplantation. We show that the linked reconstruction of strain composition and gene content profiles is critical for understanding the relationship between microbial adaptation and strain-specific functions (e.g., nutrient utilization and pathogenicity). Finally, StrainPanDA has minimal requirements for computing resources and can be scaled to process multiple species in a community in parallel. In short, StrainPanDA can be applied to metagenomic data sets to detect the association between molecular functions and microbial/host phenotypes to formulate testable hypotheses and gain novel biological insights at the strain or subspecies level.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    小麦(TriticumaestivumL.)是世界35%以上人口的主食,用面粉制作数百种烘焙食品。优良的最终使用质量是主要的育种目标,然而,改进它是特别耗时和昂贵的。此外,编码种子贮藏蛋白(SSP)的基因形成多基因家族,并且是重复的,在几个基因组组装中普遍存在缺口。为了克服这些障碍并有效地鉴定出优良的小麦SSP等位基因,我们开发了“PanSK”(Pan-SSPk-mer),用于基于基于SSP的pangenome资源的基因型到表型预测。PanSK使用29聚体序列在全基因组水平上代表每个SSP基因,以揭示地方品种和现代品种之间未开发的多样性。使用k-mer的全基因组关联研究鉴定了与最终使用质量相关的23个SSP基因,代表了新的改进目标。我们评估了黑麦secalin基因对最终使用质量的影响,发现从1BL/1RS小麦易位系中去除ω-secalin与最终使用质量的提高有关。最后,使用受PanSK启发的基于机器学习的预测,我们预测质量表型具有高准确性从基因型单独。本研究为基于SSP基因的基因组设计提供了一种有效的方法,使小麦品种具有优越的加工能力和改进的最终用途质量的育种。
    Wheat is a staple food for more than 35% of the world\'s population, with wheat flour used to make hundreds of baked goods. Superior end-use quality is a major breeding target; however, improving it is especially time-consuming and expensive. Furthermore, genes encoding seed-storage proteins (SSPs) form multi-gene families and are repetitive, with gaps commonplace in several genome assemblies. To overcome these barriers and efficiently identify superior wheat SSP alleles, we developed \"PanSK\" (Pan-SSP k-mer) for genotype-to-phenotype prediction based on an SSP-based pangenome resource. PanSK uses 29-mer sequences that represent each SSP gene at the pangenomic level to reveal untapped diversity across landraces and modern cultivars. Genome-wide association studies with k-mers identified 23 SSP genes associated with end-use quality that represent novel targets for improvement. We evaluated the effect of rye secalin genes on end-use quality and found that removal of ω-secalins from 1BL/1RS wheat translocation lines is associated with enhanced end-use quality. Finally, using machine-learning-based prediction inspired by PanSK, we predicted the quality phenotypes with high accuracy from genotypes alone. This study provides an effective approach for genome design based on SSP genes, enabling the breeding of wheat varieties with superior processing capabilities and improved end-use quality.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    果实成熟与大多数果实物种中发生的脱绿过程(叶绿素损失)有关。猕猴桃是一种特殊的物种,其果实即使在成熟后也可以通过积累大量的叶绿素来保持绿色果肉。然而,对与水果脱绿过程相关的遗传变异知之甚少。这里,通过分析来自中国猕猴桃中七个代表性品种或品系的14个染色体尺度单倍型解析基因组组装,建立了基于图形的猕猴桃pangenome。总共鉴定了49,770个非冗余基因家族,核心基因占46.6%,可有可无的基因占53.4%。总共识别出84,591种非冗余结构变异(SV)。整合参考基因组序列和变异信息的pangenome图有助于鉴定与水果颜色相关的SV。AcBCM基因启动子中的SV决定了其在果实发育后期的高表达,通过翻译后调节叶绿素分解代谢的关键酶AcSGR2,从而导致绿肉果实中的叶绿素积累。一起来看,建造了高质量的pangenome,揭示了许多遗传变异,并确定了一种新的SV介导果实颜色和果实品质,为进一步研究基因组进化和驯化提供有价值的信息,QTL基因的功能,和基因组学辅助育种。
    Fruit ripening is associated with the degreening process (loss of chlorophyll) that occurs in most fruit species. Kiwifruit is one of the special species whose fruits may maintain green flesh by accumulating a large amount of chlorophyll even after ripening. However, little is known about the genetic variations related to the fruit degreening process. Here, a graph-based kiwifruit pangenome by analyzing 14 chromosome-scale haplotype-resolved genome assemblies from seven representative cultivars or lines in Actinidia chinensis is built. A total of 49,770 non-redundant gene families are identified, with core genes constituting 46.6%, and dispensable genes constituting 53.4%. A total of 84,591 non-redundant structural variations (SVs) are identified. The pangenome graph integrating both reference genome sequences and variant information facilitates the identification of SVs related to fruit color. The SV in the promoter of the AcBCM gene determines its high expression in the late developmental stage of fruits, which causes chlorophyll accumulation in the green-flesh fruits by post-translationally regulating AcSGR2, a key enzyme of chlorophyll catabolism. Taken together, a high-quality pangenome is constructed, unraveled numerous genetic variations, and identified a novel SV mediating fruit coloration and fruit quality, providing valuable information for further investigating genome evolution and domestication, QTL genes function, and genomics-assisted breeding.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    一些布鲁氏菌。是重要的病原体。根据最新的原核生物分类法,布鲁氏菌属由兼性细胞内寄生布鲁氏菌和细胞外机会或环境布鲁氏菌组成。细胞内布氏杆菌包括经典和非经典类型,不同的物种通常表现出宿主偏好。一些经典的细胞内布鲁氏菌可以引起人畜共患布鲁氏菌病,包括B.melitensis,B.流产,B.suis,B.犬。细胞外布鲁氏菌属包括机会性或环境性物种,这些物种以前属于Ochrobactrum属,因此现在更名为例如中间布鲁氏菌或人类布鲁氏菌,这是最近扩大的布鲁氏菌属中最常见的机会性人类病原体。不同布鲁氏菌种类不同表型特征的原因尚不清楚。为进一步研究布鲁氏菌属的遗传进化特征,阐明其基因组组成与表型性状预测的关系,我们从NCBI基因组数据库中收集了布鲁氏菌的基因组数据,并进行了比较基因组学研究。我们发现经典和非经典细胞内布氏杆菌物种和细胞外布氏杆菌物种在系统发育关系上表现出差异,水平基因转移和可移动遗传元件的分布模式,毒力因子基因,和抗生素抗性基因,表明了不同布鲁氏菌种的遗传变异与表型性状预测之间的密切关系。此外,我们发现水平基因转移和可移动遗传元件的分布模式存在显著差异,毒力因子基因,和布鲁氏菌两条染色体之间的抗生素抗性基因,表明两种染色体具有不同的动力学和可塑性,在布鲁氏菌的生存和进化中起着不同的作用。这些发现为探索布鲁氏菌属的遗传进化特征提供了新的方向,并为阐明影响布鲁氏菌属表型多样性的因素提供了新的线索。
    Some Brucella spp. are important pathogens. According to the latest prokaryotic taxonomy, the Brucella genus consists of facultative intracellular parasitic Brucella species and extracellular opportunistic or environmental Brucella species. Intracellular Brucella species include classical and nonclassical types, with different species generally exhibiting host preferences. Some classical intracellular Brucella species can cause zoonotic brucellosis, including B. melitensis, B. abortus, B. suis, and B. canis. Extracellular Brucella species comprise opportunistic or environmental species which belonged formerly to the genus Ochrobactrum and thus nowadays renamed as for example Brucella intermedia or Brucella anthropi, which are the most frequent opportunistic human pathogens within the recently expanded genus Brucella. The cause of the diverse phenotypic characteristics of different Brucella species is still unclear. To further investigate the genetic evolutionary characteristics of the Brucella genus and elucidate the relationship between its genomic composition and prediction of phenotypic traits, we collected the genomic data of Brucella from the NCBI Genome database and conducted a comparative genomics study. We found that classical and nonclassical intracellular Brucella species and extracellular Brucella species exhibited differences in phylogenetic relationships, horizontal gene transfer and distribution patterns of mobile genetic elements, virulence factor genes, and antibiotic resistance genes, showing the close relationship between the genetic variations and prediction of phenotypic traits of different Brucella species. Furthermore, we found significant differences in horizontal gene transfer and the distribution patterns of mobile genetic elements, virulence factor genes, and antibiotic resistance genes between the two chromosomes of Brucella, indicating that the two chromosomes had distinct dynamics and plasticity and played different roles in the survival and evolution of Brucella. These findings provide new directions for exploring the genetic evolutionary characteristics of the Brucella genus and could offer new clues to elucidate the factors influencing the phenotypic diversity of the Brucella genus.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:牦牛的遗传多样性,青藏高原(QTP)的主要家畜,是驯化和育种工作的重要资源。本研究介绍了通过16个牦牛基因组的从头组装获得的第一个牦牛基因组。
    结果:我们发现了290Mb的非参考序列和504个新基因。我们的全基因组存在和不存在变异(PAV)分析揭示了5120个PAV相关基因,突出了广泛的品种特异性基因和在牦牛种群中频率不同的基因。基于二元基因PAV数据的主成分分析(PCA)将牦牛分为三个新的群体:野生,domestic,和金川。此外,我们提出了一个“双单倍型基因组杂交模型”,通过整合基因频率来理解品种间的杂交模式,杂合性,和基因PAV数据。PAV-GWAS基因鉴定出一个新基因(BosGru3G009179),该基因可能与金川牦牛的多肋性状有关。此外,整合的转录组和pangenome分析强调了高海拔和低海拔牦牛之间核心基因表达和差异表达基因突变负担的显著差异。跨多个物种的转录组分析显示,牦牛具有最独特的差异表达的mRNAs和lncRNAs(在高海拔和低海拔地区之间),尤其是在心脏和肺部,比较高空和低空适应性时。
    结论:牦牛pangenome为功能基因组研究提供了全面的资源和新的见解,支持未来的生物学研究和育种策略。
    BACKGROUND: The genetic diversity of yak, a key domestic animal on the Qinghai-Tibetan Plateau (QTP), is a vital resource for domestication and breeding efforts. This study presents the first yak pangenome obtained through the de novo assembly of 16 yak genomes.
    RESULTS: We discovered 290 Mb of nonreference sequences and 504 new genes. Our pangenome-wide presence and absence variation (PAV) analysis revealed 5,120 PAV-related genes, highlighting a wide range of variety-specific genes and genes with varying frequencies across yak populations. Principal component analysis (PCA) based on binary gene PAV data classified yaks into three new groups: wild, domestic, and Jinchuan. Moreover, we proposed a \'two-haplotype genomic hybridization model\' for understanding the hybridization patterns among breeds by integrating gene frequency, heterozygosity, and gene PAV data. A gene PAV-GWAS identified a novel gene (BosGru3G009179) that may be associated with the multirib trait in Jinchuan yaks. Furthermore, an integrated transcriptome and pangenome analysis highlighted the significant differences in the expression of core genes and the mutational burden of differentially expressed genes between yaks from high and low altitudes. Transcriptome analysis across multiple species revealed that yaks have the most unique differentially expressed mRNAs and lncRNAs (between high- and low-altitude regions), especially in the heart and lungs, when comparing high- and low-altitude adaptations.
    CONCLUSIONS: The yak pangenome offers a comprehensive resource and new insights for functional genomic studies, supporting future biological research and breeding strategies.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    丁酸梭菌是一种革兰氏阳性厌氧细菌,以其产生丁酸的能力而闻名。在这项研究中,我们对14C进行了全基因组测序和装配.从中国各地收集的丁酸工业菌株。我们对从NCBI下载的14个组装菌株和139个菌株进行了全基因组比较分析。我们发现与关键工业生产途径相关的基因主要存在于核心和软核基因类别中。系统发育分析表明,来自系统发育树的同一进化枝的菌株具有相似的抗生素抗性和毒力因子,这些基因大部分存在于壳和云基因类别中。最后,我们预测了产生细菌素和肉毒杆菌毒素的基因以及负责宿主防御的CRISPR系统。总之,我们的研究为工业生产提供了理想的全基因组数据库,食品应用,和丁酸梭菌的遗传研究。
    Clostridium butyricum is a Gram-positive anaerobic bacterium known for its ability to produce butyate. In this study, we conducted whole-genome sequencing and assembly of 14C. butyricum industrial strains collected from various parts of China. We performed a pan-genome comparative analysis of the 14 assembled strains and 139 strains downloaded from NCBI. We found that the genes related to critical industrial production pathways were primarily present in the core and soft-core gene categories. The phylogenetic analysis revealed that strains from the same clade of the phylogenetic tree possessed similar antibiotic resistance and virulence factors, with most of these genes present in the shell and cloud gene categories. Finally, we predicted the genes producing bacteriocins and botulinum toxins as well as CRISPR systems responsible for host defense. In conclusion, our research provides a desirable pan-genome database for the industrial production, food application, and genetic research of C. butyricum.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    pangenome捕获了一个物种的基因组多样性,来自不同种群的遗传序列集合。测序技术的进步产生了pangenome构建和分析的三种主要方法:从头组装和比较,基于参考基因组的迭代组装和基于图的pangenome构建。每种方法在处理不同数量和结构的DNA测序数据方面都存在优势和挑战。随着高质量的基因组组装和先进的生物信息学工具的出现,基于图形的pangenome正在成为探索遗传变异的生物学和功能意义的高级参考。
    A pangenome captures the genomic diversity for a species, derived from a collection of genetic sequences of diverse populations. Advances in sequencing technologies have given rise to three primary methods for pangenome construction and analysis: de novo assembly and comparison, reference genome-based iterative assembly, and graph-based pangenome construction. Each method presents advantages and challenges in processing varying amounts and structures of DNA sequencing data. With the emergence of high-quality genome assemblies and advanced bioinformatic tools, the graph-based pangenome is emerging as an advanced reference for exploring the biological and functional implications of genetic variations.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    栽培二倍体甘蓝是一种重要的蔬菜作物,但是,如果没有野生甘草的高质量参考基因组,驯化的遗传基础仍不清楚。这里,我们报道了野生甘蓝W03基因组的第一个染色体水平组装,(总基因组大小,630.7Mb;支架N50,64.6Mb)。使用新组装的W03基因组,我们构建了一个基于基因的双歧杆菌pangenome,鉴定了29,744个核心基因,23,306个可有可无的基因,和1896个私有基因。我们重新排序了53个种质,它们代表了六种潜在的野生甘草祖先物种。种群基因组分析结果表明,野生甘草种群的多样性水平最高,代表了园艺甘草种群的亲缘关系更密切。此外,发现WUSCHEL基因在驯化中起决定性作用,并参与花椰菜和西兰花凝乳的形成。我们还说明了驯化选择过程中抗病基因的丢失。我们的研究结果提供了对甘蓝型油菜驯化的深刻见解,并将促进芸苔属作物的遗传改良。
    The cultivated diploid Brassica oleracea is an important vegetable crop, but the genetic basis of its domestication remains largely unclear in the absence of high-quality reference genomes of wild B. oleracea. Here, we report the first chromosome-level assembly of the wild Brassica oleracea L. W03 genome (total genome size, 630.7 Mb; scaffold N50, 64.6 Mb). Using the newly assembled W03 genome, we constructed a gene-based B. oleracea pangenome and identified 29 744 core genes, 23 306 dispensable genes, and 1896 private genes. We re-sequenced 53 accessions, representing six potential wild B. oleracea progenitor species. The results of the population genomic analysis showed that the wild B. oleracea populations had the highest level of diversity and represents the most closely related population to modern-day horticultural B. oleracea. In addition, the WUSCHEL gene was found to play a decisive role in domestication and to be involved in cauliflower and broccoli curd formation. We also illustrate the loss of disease-resistance genes during selection for domestication. Our results provide new insights into the domestication of B. oleracea and will facilitate the future genetic improvement of Brassica crops.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

公众号