关键词: Challenging medically relevant genes Copy number variation Genome sequencing Long read sequencing Short insertion and deletion Single nucleotide polymorphism

Mesh : Humans DNA Copy Number Variations / genetics High-Throughput Nucleotide Sequencing / methods Genome, Human / genetics Polymorphism, Single Nucleotide / genetics Genetic Variation / genetics Genetic Predisposition to Disease Genetics, Population / methods INDEL Mutation

来  源:   DOI:10.1007/s00438-024-02158-x

Abstract:
BACKGROUND: A large number of challenging medically relevant genes (CMRGs) are situated in complex or highly repetitive regions of the human genome, hindering comprehensive characterization of genetic variants using next-generation sequencing technologies. In this study, we employed long-read sequencing technology, extensively utilized in studying complex genomic regions, to characterize genetic alterations, including short variants (single nucleotide variants and short insertions and deletions) and copy number variations, in 370 CMRGs across 41 individuals from 19 global populations.
RESULTS: Our analysis revealed high levels of genetic variants in CMRGs, with 68.73% exhibiting copy number variations and 65.20% containing short variants that may disrupt protein function across individuals. Such variants can influence pharmacogenomics, genetic disease susceptibility, and other clinical outcomes. We observed significant differences in CMRG variation across populations, with individuals of African ancestry harboring the highest number of copy number variants and short variants compared to samples from other continents. Notably, 15.79% to 33.96% of short variants were exclusively detectable through long-read sequencing. While the T2T-CHM13 reference genome significantly improved the assembly of CMRG regions, thereby facilitating variant detection in these regions, some regions still lacked resolution.
CONCLUSIONS: Our results provide an important reference for future clinical and pharmacogenetic studies, highlighting the need for a comprehensive representation of global genetic diversity in the reference genome and improved variant calling techniques to fully resolve medically relevant genes.
摘要:
背景:大量具有挑战性的医学相关基因(CMRG)位于人类基因组的复杂或高度重复区域,使用下一代测序技术阻碍了遗传变异的全面表征。在这项研究中,我们采用了长读数测序技术,广泛用于研究复杂的基因组区域,为了表征遗传改变,包括短变体(单核苷酸变体和短插入和缺失)和拷贝数变异,在来自19个全球人口的41个人的370个CMRG中。
结果:我们的分析显示CMRGs中存在高水平的遗传变异,68.73%表现出拷贝数变异,65.20%含有可能破坏个体蛋白质功能的短变体。这些变异可以影响药物基因组学,遗传性疾病易感性,和其他临床结果。我们观察到不同种群的CMRG变异存在显著差异,与其他大陆的样本相比,非洲血统的个体拥有最高数量的拷贝数变体和短变体。值得注意的是,15.79%至33.96%的短变体通过长读取测序是唯一可检测的。虽然T2T-CHM13参考基因组显着改善了CMRG区域的组装,从而促进这些区域的变异检测,一些地区仍然缺乏决心。
结论:我们的结果为未来的临床和药物遗传学研究提供了重要的参考,强调需要在参考基因组中全面代表全球遗传多样性,并改进变体调用技术以完全解析医学相关基因。
公众号