Transposable elements

转座因子
  • 文章类型: Journal Article
    背景:尽管有许多廉价而快速的方法来生成基因组数据,良好和精确的基因组组装仍然是一个问题,尤其是重复的代表性严重不足,而且经常组装错误。由于低覆盖率的短读数已经足以代表任何给定基因组的重复景观,提出了许多提供重复识别和分类的读取聚类算法。但是怎么能值得信赖,可靠和有代表性的重复共识来自未组装的基因组?
    结果:这里,我们结合了重复鉴定和基因组组装的方法来得出这些可靠的共识。我们测试了几个用例,例如(1)从非模型基因组的聚类短读取中建立共识,(2)从全基因组扩增设置,和(3)以重复为中心的特定问题,例如链接与核糖体基因的无连锁排列。在我们所有的用例中,得出的共识是稳健的和有代表性的。为了评估整体性能,我们将我们的高保真重复共识与RepeatExplorer2派生的重叠群进行比较,如果它们代表长读中发现的真实转座元素。我们的结果表明,有可能产生有用的,通过以自动化方式结合读段簇和基因组组装方法,从短读段获得可靠和值得信赖的共识。
    结论:我们预计,我们的工作流程为更高效、更少的手动重复表征和注释开辟了道路。受益于所有的基因组研究,尤其是那些非模式生物。
    BACKGROUND: Despite the many cheap and fast ways to generate genomic data, good and exact genome assembly is still a problem, with especially the repeats being vastly underrepresented and often misassembled. As short reads in low coverage are already sufficient to represent the repeat landscape of any given genome, many read cluster algorithms were brought forward that provide repeat identification and classification. But how can trustworthy, reliable and representative repeat consensuses be derived from unassembled genomes?
    RESULTS: Here, we combine methods from repeat identification and genome assembly to derive these robust consensuses. We test several use cases, such as (1) consensus building from clustered short reads of non-model genomes, (2) from genome-wide amplification setups, and (3) specific repeat-centred questions, such as the linked vs. unlinked arrangement of ribosomal genes. In all our use cases, the derived consensuses are robust and representative. To evaluate overall performance, we compare our high-fidelity repeat consensuses to RepeatExplorer2-derived contigs and check, if they represent real transposable elements as found in long reads. Our results demonstrate that it is possible to generate useful, reliable and trustworthy consensuses from short reads by a combination from read cluster and genome assembly methods in an automatable way.
    CONCLUSIONS: We anticipate that our workflow opens the way towards more efficient and less manual repeat characterization and annotation, benefitting all genome studies, but especially those of non-model organisms.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    转座因子(TE)具有改变个体基因组景观并塑造其所在物种进化过程的能力。通过研究生物体的生物学及其宿主的相互作用,可以理解这种深刻的变化。在广泛的物种中表征和策划TE是这一努力的基本第一步。该协议采用了在开发各种生物体的TE文库时磨练的技术,具体解决了:(1)将截短的从头结果扩展到全长TE家族;(2)TE多序列比对的迭代细化;(3)使用比对可视化来评估模型完整性和亚家族结构。©2021威利期刊有限责任公司。基本方案:从头重复查找器衍生的consensi和种子比对的扩展和边缘抛光支持方案:使用consensi文库和基因组组装生成种子比对。
    Transposable elements (TEs) have the ability to alter individual genomic landscapes and shape the course of evolution for species in which they reside. Such profound changes can be understood by studying the biology of the organism and the interplay of the TEs it hosts. Characterizing and curating TEs across a wide range of species is a fundamental first step in this endeavor. This protocol employs techniques honed while developing TE libraries for a wide range of organisms and specifically addresses: (1) the extension of truncated de novo results into full-length TE families; (2) the iterative refinement of TE multiple sequence alignments; and (3) the use of alignment visualization to assess model completeness and subfamily structure. © 2021 Wiley Periodicals LLC. Basic Protocol: Extension and edge polishing of consensi and seed alignments derived from de novo repeat finders Support Protocol: Generating seed alignments using a library of consensi and a genome assembly.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

公众号