关键词: Association studies Clustering Populations structure

Mesh : Humans Genetic Markers Polymorphism, Single Nucleotide Linkage Disequilibrium Phenotype Cluster Analysis

来  源:   DOI:10.1186/s12859-023-05511-w   PDF(Pubmed)

Abstract:
BACKGROUND: Identifying variants associated with complex traits is a challenging task in genetic association studies due to linkage disequilibrium (LD) between genetic variants and population stratification, unrelated to the disease risk. Existing methods of population structure correction use principal component analysis or linear mixed models with a random effect when modeling associations between a trait of interest and genetic markers. However, due to stringent significance thresholds and latent interactions between the markers, these methods often fail to detect genuinely associated variants.
RESULTS: To overcome this, we propose CluStrat, which corrects for complex arbitrarily structured populations while leveraging the linkage disequilibrium induced distances between genetic markers. It performs an agglomerative hierarchical clustering using the Mahalanobis distance covariance matrix of the markers. In simulation studies, we show that our method outperforms existing methods in detecting true causal variants. Applying CluStrat on WTCCC2 and UK Biobank cohorts, we found biologically relevant associations in Schizophrenia and Myocardial Infarction. CluStrat was also able to correct for population structure in polygenic adaptation of height in Europeans.
CONCLUSIONS: CluStrat highlights the advantages of biologically relevant distance metrics, such as the Mahalanobis distance, which captures the cryptic interactions within populations in the presence of LD better than the Euclidean distance.
摘要:
背景:由于遗传变异和种群分层之间的连锁不平衡(LD),在遗传关联研究中,识别与复杂性状相关的变异是一项具有挑战性的任务,与疾病风险无关。现有的种群结构校正方法在对感兴趣的性状和遗传标记之间的关联进行建模时使用具有随机效应的主成分分析或线性混合模型。然而,由于严格的显著性阈值和标记之间的潜在相互作用,这些方法通常无法检测到真正相关的变体。
结果:为了克服这个问题,我们提议CluStrat,它可以纠正复杂的任意结构的种群,同时利用遗传标记之间的连锁不平衡诱导距离。它使用标记的马氏距离协方差矩阵执行聚集的层次聚类。在模拟研究中,我们表明,我们的方法在检测真正的因果变异方面优于现有的方法。在WTCCC2和英国生物库队列中应用CluStrat,我们在精神分裂症和心肌梗死中发现了生物学相关的关联.CluStrat还能够纠正欧洲人身高多基因适应中的种群结构。
结论:CluStrat突出了生物学相关距离度量的优势,比如马氏距离,它比欧几里得距离更好地捕获了LD存在下种群内部的隐秘相互作用。
公众号