RESULTS: We identified significant population stratification among ethnolinguistically diverse Guangxi populations, suggesting their differentiated genetic origin and admixture processes. GPH shared more alleles related to Zhuang than Southern Han Chinese but received more northern ancestry relative to Zhuang. Admixture models and estimates of genetic distances showed that GPH had a close genetic relationship with geographically close TK compared to Northern Han Chinese, supporting their admixture origin hypothesis. Further admixture time and demographic history reconstruction supported GPH was formed via admixture between Northern Han Chinese and Southern TK people. We identified robust signatures associated with lipid metabolisms, such as fatty acid desaturases (FADS) and medically relevant loci associated with Mendelian disorder (GJB2) and complex diseases. We also explored the shared and unique selection signatures of ethnically different but linguistically related Guangxi lineages and found some shared signals related to immune and malaria resistance.
CONCLUSIONS: Our genetic analysis illuminated the language-related fine-scale genetic structure and provided robust genetic evidence to support the admixture hypothesis that can explain the pattern of observed genetic diversity and formation of GPH. This work presented one comprehensive analysis focused on the population history and demographical adaptative process, which provided genetic evidence for personal health management and disease risk prediction models from Guangxi people. Further large-scale whole-genome sequencing projects would provide the entire landscape of southern Chinese genomic diversity and their contributions to human health and disease traits.
结果:我们发现,在不同民族语言的广西人群中,有显著的人口分层,表明它们的分化遗传起源和混合过程。GPH与壮族的等位基因比南方汉族的等位基因多,但相对于壮族的北方血统更多。混合模型和遗传距离的估计表明,与中国北方汉族相比,GPH与地理上接近的TK具有密切的遗传关系,支持他们的混合起源假说。进一步的混合时间和人口历史重建支持GPH是通过北汉族和南方TK人之间的混合形成的。我们确定了与脂质代谢相关的稳健特征,如脂肪酸去饱和酶(FADS)和与孟德尔障碍(GJB2)和复杂疾病相关的医学相关基因座。我们还探索了不同种族但在语言上相关的广西血统的共享和独特的选择特征,并发现了一些与免疫和疟疾抗性有关的共享信号。
结论:我们的遗传分析阐明了与语言相关的精细尺度遗传结构,并提供了有力的遗传证据来支持混合假说,该假说可以解释所观察到的遗传多样性和GPH形成的模式。这项工作对人口历史和人口适应过程进行了全面的分析,为广西人群的个人健康管理和疾病风险预测模型提供了遗传证据。进一步的大规模全基因组测序项目将提供中国南方基因组多样性的整个景观及其对人类健康和疾病特征的贡献。