关键词: Comparative genomics Gene annotation Gene repertoires Gene structure Protein-coding genes Standardization

Mesh : Genomics Molecular Sequence Annotation / methods Software

来  源:   DOI:10.1186/s12864-017-3870-8   PDF(Sci-hub)   PDF(Pubmed)

Abstract:
The comparison of gene and genome structures across species has the potential to reveal major trends of genome evolution. However, such a comparative approach is currently hampered by a lack of standardization (e.g., Elliott TA, Gregory TR, Philos Trans Royal Soc B: Biol Sci 370:20140331, 2015). For example, testing the hypothesis that the total amount of coding sequences is a reliable measure of potential proteome diversity (Wang M, Kurland CG, Caetano-Anollés G, PNAS 108:11954, 2011) requires the application of standardized definitions of coding sequence and genes to create both comparable and comprehensive data sets and corresponding summary statistics. However, such standard definitions either do not exist or are not consistently applied. These circumstances call for a standard at the descriptive level using a minimum of parameters as well as an undeviating use of standardized terms, and for software that infers the required data under these strict definitions. The acquisition of a comprehensive, descriptive, and standardized set of parameters and summary statistics for genome publications and further analyses can thus greatly benefit from the availability of an easy to use standard tool.
We developed a new open-source command-line tool, COGNATE (Comparative Gene Annotation Characterizer), which uses a given genome assembly and its annotation of protein-coding genes for a detailed description of the respective gene and genome structure parameters. Additionally, we revised the standard definitions of gene and genome structures and provide the definitions used by COGNATE as a working draft suggestion for further reference. Complete parameter lists and summary statistics are inferred using this set of definitions to allow down-stream analyses and to provide an overview of the genome and gene repertoire characteristics. COGNATE is written in Perl and freely available at the ZFMK homepage ( https://www.zfmk.de/en/COGNATE ) and on github ( https://github.com/ZFMK/COGNATE ).
The tool COGNATE allows comparing genome assemblies and structural elements on multiples levels (e.g., scaffold or contig sequence, gene). It clearly enhances comparability between analyses. Thus, COGNATE can provide the important standardization of both genome and gene structure parameter disclosure as well as data acquisition for future comparative analyses. With the establishment of comprehensive descriptive standards and the extensive availability of genomes, an encompassing database will become possible.
摘要:
跨物种的基因和基因组结构的比较有可能揭示基因组进化的主要趋势。然而,这种比较方法目前受到缺乏标准化的阻碍(例如,艾略特·塔,格雷戈里TR,PhilosTransRoyalSocB:BiolSci370:20140331,2015)。例如,测试以下假设:编码序列的总量是潜在蛋白质组多样性的可靠量度(WangM,KurlandCG,Caetano-AnollésG,PNAS108:11954,2011)要求应用编码序列和基因的标准化定义,以创建具有可比性和综合性的数据集以及相应的汇总统计数据。然而,这样的标准定义要么不存在,要么不一致。这些情况要求在描述性水平上使用最少的参数以及不偏离标准术语的使用,以及在这些严格定义下推断所需数据的软件。收购一个全面的,描述性,描述性因此,基因组出版物和进一步分析的标准化参数集和汇总统计数据可以从易于使用的标准工具的可用性中大大受益。
我们开发了一个新的开源命令行工具,COGNATE(比较基因注释表征器),它使用给定的基因组组装及其对蛋白质编码基因的注释来详细描述各自的基因和基因组结构参数。此外,我们修订了基因和基因组结构的标准定义,并提供了COGNATE使用的定义作为工作建议草案,供进一步参考.使用这组定义推断完整的参数列表和汇总统计,以允许进行下游分析并提供基因组和基因库特征的概述。COGNATE是用Perl编写的,可以在ZFMK主页(https://www.zfmk.de/en/COGNATE)和github(https://github.com/ZFMK/COGNATE)。
工具COGNATE允许在多个水平上比较基因组组装和结构元件(例如,支架或重叠群序列,基因)。它显然增强了分析之间的可比性。因此,COGNATE可以提供基因组和基因结构参数公开的重要标准化以及数据采集,以用于未来的比较分析。随着全面描述性标准的建立和基因组的广泛可用性,一个完整的数据库将成为可能。
公众号