关键词: assembly brucella maximum likelihood phylogenetics rickettsiales

Mesh : Phylogeny Genome, Bacterial Genomics / methods Software Bacteria / genetics classification Computational Biology / methods Evolution, Molecular

来  源:   DOI:10.1093/g3journal/jkae119   PDF(Pubmed)

Abstract:
There are a staggering number of publicly available bacterial genome sequences (at writing, 2.0 million assemblies in NCBI\'s GenBank alone), and the deposition rate continues to increase. This wealth of data begs for phylogenetic analyses to place these sequences within an evolutionary context. A phylogenetic placement not only aids in taxonomic classification but informs the evolution of novel phenotypes, targets of selection, and horizontal gene transfer. Building trees from multi-gene codon alignments is a laborious task that requires bioinformatic expertise, rigorous curation of orthologs, and heavy computation. Compounding the problem is the lack of tools that can streamline these processes for building trees from large-scale genomic data. Here we present OrthoPhyl, which takes bacterial genome assemblies and reconstructs trees from whole genome codon alignments. The analysis pipeline can analyze an arbitrarily large number of input genomes (>1200 tested here) by identifying a diversity-spanning subset of assemblies and using these genomes to build gene models to infer orthologs in the full dataset. To illustrate the versatility of OrthoPhyl, we show three use cases: E. coli/Shigella, Brucella/Ochrobactrum and the order Rickettsiales. We compare trees generated with OrthoPhyl to trees generated with kSNP3 and GToTree along with published trees using alternative methods. We show that OrthoPhyl trees are consistent with other methods while incorporating more data, allowing for greater numbers of input genomes, and more flexibility of analysis.
摘要:
公开可用的细菌基因组序列数量惊人(在撰写本文时,仅在NCBI的GenBank中就有200万个程序集),沉积速率继续增加。这些丰富的数据需要进行系统发育分析,以将这些序列置于进化背景下。系统发育位置不仅有助于分类学分类,但告知新表型的进化,选择的目标,和水平基因转移。从多基因密码子比对构建树木是一项艰巨的任务,需要生物信息学专业知识,严格的直系同源物策展,和繁重的计算。使问题复杂化的是缺乏可以简化从大规模基因组数据构建树木的这些过程的工具。这里我们介绍OrthoPhyl,它采用细菌基因组组装并从全基因组密码子比对重建树。分析流程可以通过识别跨越组件子集的多样性并使用这些基因组来构建基因模型以推断完整数据集中的直向同源物,来分析任意数量的输入基因组(此处测试的>1200)。为了说明OrthoPhyl的多功能性,我们展示了三个用例:大肠杆菌/志贺氏菌,布鲁氏菌/嗜铬杆菌,和Rickettsiales订单。我们使用替代方法将用OrthoPhyl生成的树与用kSNP3和GToTree生成的树以及已发布的树进行比较。我们证明了OrthoPhyl树与其他方法是一致的,同时结合了更多的数据,允许更多的输入基因组,更灵活的分析。
公众号