关键词: Insertion polymorphism Library creation Pangenome Transposable element Transposable element identification

来  源:   DOI:10.1186/s13100-024-00323-y   PDF(Pubmed)

Abstract:
BACKGROUND: Transposable Elements (TEs) are segments of DNA, typically a few hundred base pairs up to several tens of thousands bases long, that have the ability to generate new copies of themselves in the genome. Most existing methods used to identify TEs in a newly sequenced genome are based on their repetitive character, together with detection based on homology and structural features. As new high quality assemblies become more common, including the availability of multiple independent assemblies from the same species, an alternative strategy for identification of TE families becomes possible in which we focus on the polymorphism at insertion sites caused by TE mobility.
RESULTS: We develop the idea of using the structural polymorphisms found in pangenomes to create a library of the TE families recently active in a species, or in a closely related group of species. We present a tool, pantera, that achieves this task, and illustrate its use both on species with well-curated libraries, and on new assemblies.
CONCLUSIONS: Our results show that pantera is sensitive and accurate, tending to correctly identify complete elements with precise boundaries, and is particularly well suited to detect larger, low copy number TEs that are often undetected with existing de novo methods.
摘要:
背景:转座因子(TE)是DNA的片段,通常是几百个碱基对到几万个碱基长,有能力在基因组中产生自己的新拷贝。大多数用于在新测序的基因组中鉴定TEs的现有方法都是基于它们的重复特性,以及基于同源性和结构特征的检测。随着新的高质量组件变得越来越普遍,包括来自同一物种的多个独立组件的可用性,另一种鉴定TE家族的策略成为可能,其中我们关注由TE迁移引起的插入位点的多态性.
结果:我们开发了使用pangenomes中发现的结构多态性来创建一个最近在物种中活跃的TE家族文库的想法,或密切相关的物种。我们提供一个工具,潘德拉,完成这项任务,并通过精心策划的图书馆说明了它在物种上的使用,和新的集会。
结论:我们的结果表明,潘能是灵敏和准确的,倾向于正确识别具有精确边界的完整元素,特别适合检测更大的,低拷贝数TE,通常用现有的从头方法检测不到。
公众号