关键词: heterochromatin human evolution population structure repetitive DNA

Mesh : Humans Centromere / genetics Genome, Human DNA, Satellite / genetics Evolution, Molecular Genetic Variation

来  源:   DOI:10.1093/gbe/evae153   PDF(Pubmed)

Abstract:
Although repetitive DNA forms much of the human genome, its study is challenging due to limitations in assembly and alignment of repetitive short-reads. We have deployed k-Seek, software that detects tandem repeats embedded in single reads, on 2,504 human genomes from the 1,000 Genomes Project to quantify the variation and abundance of simple satellites (repeat units <20 bp). We find that the ancestral monomer of Human Satellite 3 makes up the largest portion of simple satellite content in humans (mean of ∼8 Mb). We discovered ∼50,000 rare tandem repeats that are not detected in the T2T-CHM13v2.0 assembly, including undescribed variants of telomericand pericentromeric repeats. We find broad homogeneity of the most abundant repeats across populations, except for AG-rich repeats which are more abundant in African individuals. We also find cliques of highly similar AG- and AT-rich satellites that are interspersed and form higher-order structures that covary in copy number across individuals, likely through concerted amplification via unequal exchange. Finally, we use pericentromeric polymorphisms to estimate centromeric genetic relatedness between individuals and find a strong predictive relationship between centromeric lineages and pericentromeric simple satellite abundances. In particular, ancestral monomers of Human Satellite 2 and Human Satellite 3 abundances correlate with clusters of centromeric ancestry on chromosome 16 and chromosome 9, with some clusters structured by population. These results provide new descriptions of the population dynamics that underlie the evolution of simple satellites in humans.
摘要:
尽管重复的DNA构成了人类基因组的大部分,由于重复短读段的组装和比对的局限性,其研究具有挑战性.我们已经部署了k-Seek,检测嵌入在单个读取中的串联重复的软件,1,000个基因组项目中的2,504个人类基因组,以量化简单卫星的变异和丰度(重复单位<20bp)。我们发现,人类卫星3的祖先单体构成了人类中简单卫星含量的最大部分(平均值约为8Mb)。我们发现了50,000个罕见的串联重复,在T2T-CHM13v2.0组件中没有检测到,包括端粒和着丝粒重复序列的未描述变体。我们发现种群中最丰富的重复序列具有广泛的同质性,除了富含AG的重复序列在非洲个体中更丰富。我们还发现了高度相似的AG和AT丰富的卫星集团,它们散布并形成更高阶的结构,这些结构在个体之间的拷贝数上相互变化,可能是通过不平等交换的一致放大。最后,我们使用着丝粒周围的多态性来估计个体之间的着丝粒遗传相关性,并发现着丝粒谱系和着丝粒周围的简单卫星丰度之间有很强的预测关系。特别是,人类卫星2和人类卫星3丰度的祖先单体与16号染色体和9号染色体上的着丝粒祖先簇相关,其中一些簇由种群构成。这些结果提供了对人类简单卫星进化基础的种群动态的新描述。
公众号