sequence contamination

序列污染
  • 文章类型: Journal Article
    比较基因组学是对生物体内和生物体内的遗传信息进行比较,以了解进化,结构,和基因的功能,蛋白质,和非编码区(Sivashankari和Shanmughavel,Bioinformation1:376-8,2007).测序技术和组装算法的进步导致了对大基因组进行测序的能力,并提供了大量用于比较基因组分析的数据。可以利用比较分析来系统地探索和评估物种之间的生物关系和进化,帮助理解基因的结构和功能,并更好地了解疾病和潜在的药物靶标。随着我们对遗传学知识的扩展,比较基因组学可以帮助在更广泛的生命树中识别新兴的模式生物,积极影响人类健康。这种影响包括,但不限于,人畜共患疾病研究,治疗学的发展,微生物组研究,异种移植,肿瘤学,和毒理学。尽管比较基因组学取得了进展,围绕数量出现了新的挑战,质量保证,注释,以及基因组数据和元数据的互操作性。需要新的工具和方法来应对这些挑战并满足研究人员的需求。本文重点介绍了美国国立卫生研究院(NIH)比较基因组学资源(CGR)如何解决比较基因组学进一步影响人类健康的机会,并应对研究人员面临的日益复杂的挑战。
    Comparative genomics is the comparison of genetic information within and across organisms to understand the evolution, structure, and function of genes, proteins, and non-coding regions (Sivashankari and Shanmughavel, Bioinformation 1:376-8, 2007). Advances in sequencing technology and assembly algorithms have resulted in the ability to sequence large genomes and provided a wealth of data that are being used in comparative genomic analyses. Comparative analysis can be leveraged to systematically explore and evaluate the biological relationships and evolution between species, aid in understanding the structure and function of genes, and gain a better understanding of disease and potential drug targets. As our knowledge of genetics expands, comparative genomics can help identify emerging model organisms among a broader span of the tree of life, positively impacting human health. This impact includes, but is not limited to, zoonotic disease research, therapeutics development, microbiome research, xenotransplantation, oncology, and toxicology. Despite advancements in comparative genomics, new challenges have arisen around the quantity, quality assurance, annotation, and interoperability of genomic data and metadata. New tools and approaches are required to meet these challenges and fulfill the needs of researchers. This paper focuses on how the National Institutes of Health (NIH) Comparative Genomics Resource (CGR) can address both the opportunities for comparative genomics to further impact human health and confront an increasingly complex set of challenges facing researchers.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    公共测序数据库是生物研究人员的宝贵资源,但是评估数据的准确性以及对如此大的数据集合的管理和维护可能是具有挑战性的。真核细胞的基因组,如叶绿体和其他质体,由于它们与细菌的紧密进化关系,在这些数据库中特别容易出现装配错误和错误陈述,它们可能在同一环境中共存,对植物进行测序时也是如此。这里,基于与细菌基因组的序列相似性,我们确定了美国国立卫生研究院(NIH)参考序列(RefSeq)收集中存在的几个可疑叶绿体组件.对这些叶绿体组装的研究揭示了细菌序列错误整合到叶绿体核糖体RNA(rRNA)基因座中的例子,通常在rRNA基因中,推测是由于质体和细菌rRNA之间的高度相似性。在检查的叶绿体中确定为最可能的污染源的细菌谱系要么是已知的植物关联,或与被检查的植物在相同的环境生态位共同发生。修改用于处理来自全基因组测序工作的非靶向“原始”鸟枪测序数据的方法,例如在浆体组装之前鉴定和去除细菌读数,可以在将来消除类似的错误。
    Public sequencing databases are invaluable resources to biological researchers, but assessing data veracity as well as the curation and maintenance of such large collections of data can be challenging. Genomes of eukaryotic organelles, such as chloroplasts and other plastids, are particularly susceptible to assembly errors and misrepresentations in these databases due to their close evolutionary relationships with bacteria, which may co-occur within the same environment, as can be the case when sequencing plants. Here, based on sequence similarities with bacterial genomes, we identified several suspicious chloroplast assemblies present in the National Institutes of Health (NIH) Reference Sequence (RefSeq) collection. Investigations into these chloroplast assemblies reveal examples of erroneous integration of bacterial sequences into chloroplast ribosomal RNA (rRNA) loci, often within the rRNA genes, presumably due to the high similarity between plastid and bacterial rRNAs. The bacterial lineages identified within the examined chloroplasts as the most likely source of contamination are either known associates of plants, or co-occur in the same environmental niches as the examined plants. Modifications to the methods used to process untargeted \'raw\' shotgun sequencing data from whole genome sequencing efforts, such as the identification and removal of bacterial reads prior to plastome assembly, could eliminate similar errors in the future.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    最近的高通量测序努力已经产生了多基因/蛋白质系统发育,可以自信地解决纤毛门内的几种类间和类内关系。我们利用海洋微生物真核生物转录组测序项目的大规模测序工作,其他SRA提交,和可用的基因组数据,以及我们自己的测序工作,以确定Mesodinium的系统发育位置,并生成迄今为止分类最丰富的系统基因组纤毛虫树。不管数据挖掘策略如何,多蛋白数据集,或者进化的分子模型,我们一直在纤毛虫类之间恢复同样良好的关系,确认先前确定的许多更高级别的关系。Mesodinium总是与Litostomatea成员形成一个单系群,Mesodinium-M的兼养物种rubrum,M.少校,和M.chamaeleon-比异养成员更密切相关,M.pulex.Mesodinium作为其他litostomes姐妹的良好支持的位置与以前的分子分析形成鲜明对比,包括利用相同转录组数据库的系统基因组研究。这些拓扑差异说明了在挖掘混合物种转录组时需要谨慎,并表明在猎物污染中识别纤毛虫序列-特别是对于Mesodinium物种,其中被盗猎物核的表达似乎占主导地位-需要对包含序列的系统发育进行彻底和迭代的筛选来自大型猎物。
    Recent high-throughput sequencing endeavors have yielded multigene/protein phylogenies that confidently resolve several inter- and intra-class relationships within the phylum Ciliophora. We leverage the massive sequencing efforts from the Marine Microbial Eukaryote Transcriptome Sequencing Project, other SRA submissions, and available genome data with our own sequencing efforts to determine the phylogenetic position of Mesodinium and to generate the most taxonomically rich phylogenomic ciliate tree to date. Regardless of the data mining strategy, the multiprotein data set, or the molecular models of evolution employed, we consistently recovered the same well-supported relationships among ciliate classes, confirming many of the higher-level relationships previously identified. Mesodinium always formed a monophyletic group with members of the Litostomatea, with mixotrophic species of Mesodinium-M. rubrum, M. major, and M. chamaeleon-being more closely related to each other than to the heterotrophic member, M. pulex. The well-supported position of Mesodinium as sister to other litostomes contrasts with previous molecular analyses including those from phylogenomic studies that exploited the same transcriptomic databases. These topological discrepancies illustrate the need for caution when mining mixed-species transcriptomes and indicate that identifying ciliate sequences among prey contamination-particularly for Mesodinium species where expression from stolen prey nuclei appears to dominate-requires thorough and iterative vetting with phylogenies that incorporate sequences from a large outgroup of prey.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

公众号