关键词: GWAS PGS autoimmune diseases biobank studies cross-biobank analysis ensemble learning genetic risk genetic variability genome-wide association studies method evaluation phenotype prediction polygenic scores

来  源:   DOI:10.1016/j.ajhg.2024.06.003

Abstract:
Methods of estimating polygenic scores (PGSs) from genome-wide association studies are increasingly utilized. However, independent method evaluation is lacking, and method comparisons are often limited. Here, we evaluate polygenic scores derived via seven methods in five biobank studies (totaling about 1.2 million participants) across 16 diseases and quantitative traits, building on a reference-standardized framework. We conducted meta-analyses to quantify the effects of method choice, hyperparameter tuning, method ensembling, and the target biobank on PGS performance. We found that no single method consistently outperformed all others. PGS effect sizes were more variable between biobanks than between methods within biobanks when methods were well tuned. Differences between methods were largest for the two investigated autoimmune diseases, seropositive rheumatoid arthritis and type 1 diabetes. For most methods, cross-validation was more reliable for tuning hyperparameters than automatic tuning (without the use of target data). For a given target phenotype, elastic net models combining PGS across methods (ensemble PGS) tuned in the UK Biobank provided consistent, high, and cross-biobank transferable performance, increasing PGS effect sizes (β coefficients) by a median of 5.0% relative to LDpred2 and MegaPRS (the two best-performing single methods when tuned with cross-validation). Our interactively browsable online-results and open-source workflow prspipe provide a rich resource and reference for the analysis of polygenic scoring methods across biobanks.
摘要:
来自全基因组关联研究的估计多基因评分(PGSs)的方法越来越多地被利用。然而,缺乏独立的方法评估,和方法比较往往是有限的。这里,我们评估了通过5项生物库研究(总计约120万参与者)中的7种方法得出的16种疾病和数量性状的多基因评分,建立在参考标准化框架上。我们进行了荟萃分析,以量化方法选择的影响,超参数调整,方法集成,和PGS性能的目标生物库。我们发现,没有一种方法能始终如一地胜过所有其他方法。当方法被很好地调整时,PGS效应大小在生物库之间比在生物库内的方法之间更可变。两种研究的自身免疫性疾病的方法之间的差异最大,血清阳性类风湿性关节炎和1型糖尿病。对于大多数方法,对于超参数调整,交叉验证比自动调整(不使用目标数据)更可靠.对于给定的目标表型,在UKBiobank中调谐的跨方法(集合PGS)组合PGS的弹性网络模型提供了一致的,高,和跨生物库可转移性能,将PGS效应大小(β系数)相对于LDpred2和MegaPRS(当通过交叉验证调整时,两种性能最佳的单一方法)的中位数增加5.0%。我们的可交互浏览的在线结果和开源工作流程prspipe为跨生物库的多基因评分方法的分析提供了丰富的资源和参考。
公众号