关键词: deletion evolution indels insertion selection

Mesh : INDEL Mutation Humans Animals Protein Structure, Secondary Mice Rats Evolution, Molecular Proteins / genetics chemistry Dogs Selection, Genetic Genome

来  源:   DOI:10.1093/gbe/evae093   PDF(Pubmed)

Abstract:
A fundamental goal in evolutionary biology and population genetics is to understand how selection shapes the fate of new mutations. Here, we test the null hypothesis that insertion-deletion (indel) events in protein-coding regions occur randomly with respect to secondary structures. We identified indels across 11,444 sequence alignments in mouse, rat, human, chimp, and dog genomes and then quantified their overlap with four different types of secondary structure-alpha helices, beta strands, protein bends, and protein turns-predicted by deep-learning methods of AlphaFold2. Indels overlapped secondary structures 54% as much as expected and were especially underrepresented over beta strands, which tend to form internal, stable regions of proteins. In contrast, indels were enriched by 155% over regions without any predicted secondary structures. These skews were stronger in the rodent lineages compared to the primate lineages, consistent with population genetic theory predicting that natural selection will be more efficient in species with larger effective population sizes. Nonsynonymous substitutions were also less common in regions of protein secondary structure, although not as strongly reduced as in indels. In a complementary analysis of thousands of human genomes, we showed that indels overlapping secondary structure segregated at significantly lower frequency than indels outside of secondary structure. Taken together, our study shows that indels are selected against if they overlap secondary structure, presumably because they disrupt the tertiary structure and function of a protein.
摘要:
进化生物学和群体遗传学的一个基本目标是了解选择如何影响新突变的命运。在这里,我们测试了零假设,即蛋白质编码区中的插入-缺失事件(indel)相对于二级结构随机发生。我们在小鼠中鉴定了11,444个序列比对的indel,rat,人类,黑猩猩,和狗的基因组,然后量化它们与四种不同类型的二级结构-α螺旋的重叠,β链,蛋白质弯曲,和蛋白质转变-通过AlphaFold2的深度学习方法预测。Indels与二级结构的重叠程度达到预期的54%,特别是在β链上代表性不足,往往形成内部,蛋白质的稳定区域。相比之下,在没有任何预测的二级结构的区域中,indel富集了155%。与灵长类动物谱系相比,啮齿动物谱系中的这些偏斜更强,与种群遗传理论一致,预测自然选择在有效种群规模较大的物种中效率更高。非同义替换在蛋白质二级结构区域中也不太常见,虽然没有像indel那样严重减少。在对数千个人类基因组的互补分析中,我们表明,重叠二级结构的indel分离频率明显低于二级结构外的indel。一起来看,我们的研究表明,如果插入缺失与二级结构重叠,大概是因为它们破坏了蛋白质的三级结构和功能。
公众号