关键词: alignment genome read-depth retrocopy segmental duplication

Mesh : Dogs / genetics Animals Gene Duplication Genome Genomics Evolution, Molecular Retroelements

来  源:   DOI:10.1093/gbe/evae142   PDF(Pubmed)

Abstract:
Recent years have seen a dramatic increase in the number of canine genome assemblies available. Duplications are an important source of evolutionary novelty and are also prone to misassembly. We explored the duplication content of nine canine genome assemblies using both genome self-alignment and read-depth approaches. We find that 8.58% of the genome is duplicated in the canFam4 assembly, derived from the German Shepherd Dog Mischka, including 90.15% of unplaced contigs. Highlighting the continued difficulty in properly assembling duplications, less than half of read-depth and assembly alignment duplications overlap, but the mCanLor1.2 Greenland wolf assembly shows greater concordance. Further study shows the presence of multiple segments that have alignments to four or more duplicate copies. These high-recurrence duplications correspond to gene retrocopies. We identified 3,892 candidate retrocopies from 1,316 parental genes in the canFam4 assembly and find that ∼8.82% of duplicated base pairs involve a retrocopy, confirming this mechanism as a major driver of gene duplication in canines. Similar patterns are found across eight other recent canine genome assemblies, with metrics supporting a greater quality of the PacBio HiFi mCanLor1.2 assembly. Comparison between the wolf and other canine assemblies found that 92% of retrocopy insertions are shared between assemblies. By calculating the number of generations since genome divergence, we estimate that new retrocopy insertions appear, on average, in 1 out of 3,514 births. Our analyses illustrate the impact of retrogene formation on canine genomes and highlight the variable representation of duplicated sequences among recently completed canine assemblies.
摘要:
近年来,犬基因组组装的数量急剧增加。重复是进化新颖性的重要来源,也容易发生组装错误。我们使用基因组自对齐和读取深度方法探索了9个犬基因组组装的重复内容。我们发现8.58%的基因组在canFam4组装中重复,源自德国牧羊犬Mischka,包括90.15%的未放置重叠群。突出了正确组装副本的持续困难,少于一半的读取深度和程序集对齐重复重叠,但是mCanLor1.2格陵兰狼大会显示出更大的一致性。进一步的研究显示存在与四个或更多个重复拷贝具有比对的多个区段。这些高复发重复对应于基因逆转录。我们在canFam4组装中从1,316个亲本基因中鉴定了3,892个候选逆转录,发现大约8.82%的重复碱基对涉及逆转录,证实这种机制是犬科动物基因复制的主要驱动因素。在其他八个最近的犬基因组组装中也发现了类似的模式,与支持更高质量的PacBioHiFimCanLor1.2组件的指标。狼和其他犬类装配体之间的比较发现,装配体之间共有92%的逆转录插入。通过计算自基因组分化以来的世代数,我们估计会出现新的回溯插入,平均而言,在3,514名出生中的1名。我们的分析说明了逆转录基因形成对犬基因组的影响,并强调了最近完成的犬装配中重复序列的可变表示。
公众号