关键词: Data removal Down-sample Genome-wide association study Genomic SEM Genomics Leave-one-out Meta-analysis Summary statistics

Mesh : Genome-Wide Association Study / methods Polymorphism, Single Nucleotide / genetics Phenotype Genomics / methods Multifactorial Inheritance

来  源:   DOI:10.1007/s10519-023-10152-z   PDF(Pubmed)

Abstract:
Proprietary genetic datasets are valuable for boosting the statistical power of genome-wide association studies (GWASs), but their use can restrict investigators from publicly sharing the resulting summary statistics. Although researchers can resort to sharing down-sampled versions that exclude restricted data, down-sampling reduces power and might change the genetic etiology of the phenotype being studied. These problems are further complicated when using multivariate GWAS methods, such as genomic structural equation modeling (Genomic SEM), that model genetic correlations across multiple traits. Here, we propose a systematic approach to assess the comparability of GWAS summary statistics that include versus exclude restricted data. Illustrating this approach with a multivariate GWAS of an externalizing factor, we assessed the impact of down-sampling on (1) the strength of the genetic signal in univariate GWASs, (2) the factor loadings and model fit in multivariate Genomic SEM, (3) the strength of the genetic signal at the factor level, (4) insights from gene-property analyses, (5) the pattern of genetic correlations with other traits, and (6) polygenic score analyses in independent samples. For the externalizing GWAS, although down-sampling resulted in a loss of genetic signal and fewer genome-wide significant loci; the factor loadings and model fit, gene-property analyses, genetic correlations, and polygenic score analyses were found robust. Given the importance of data sharing for the advancement of open science, we recommend that investigators who generate and share down-sampled summary statistics report these analyses as accompanying documentation to support other researchers\' use of the summary statistics.
摘要:
专有遗传数据集对于提高全基因组关联研究(GWAS)的统计能力非常有价值,但是它们的使用可能会限制调查人员公开分享最终的汇总统计数据。尽管研究人员可以诉诸共享排除受限数据的下采样版本,下采样会降低功率,并可能改变正在研究的表型的遗传病因。当使用多变量GWAS方法时,这些问题更加复杂,如基因组结构方程建模(基因组SEM),建立了多个性状的遗传相关性模型。这里,我们提出了一种系统的方法来评估包括与排除限制性数据的GWAS汇总统计数据的可比性.用外部化因子的多变量GWAS来说明这种方法,我们评估了下采样对(1)单变量GWAS中遗传信号强度的影响,(2)多元基因组SEM中的因子载荷和模型拟合,(3)因子水平上的遗传信号强弱,(4)来自基因属性分析的见解,(5)与其他性状的遗传相关模式,(6)独立样本的多基因评分分析。对于外部化GWAS,虽然下采样导致遗传信号丢失和较少的全基因组显著基因座;因子负荷和模型拟合,基因属性分析,遗传相关性,和多基因评分分析被发现是稳健的。鉴于数据共享对于推进开放科学的重要性,我们建议产生和分享下采样汇总统计数据的研究者将这些分析报告为随附文档,以支持其他研究者使用汇总统计数据.
公众号