背景:已经对亚洲人群进行了全表型关联研究(PheWASs),包括韩国人,但许多是基于芯片或外显子组基因分型数据。此类研究在全基因组关联分析方面存在局限性,这使得具有基因组到表型组关联信息与尽可能大的全基因组和匹配的表型组数据,以进行进一步的人口基因组研究和开发基于人口基因组学的医疗保健服务至关重要。
结果:这里,我们提供了4,157个全基因组序列(Korea4K)和107个健康检查参数,作为韩国基因组计划的最大基因组资源。它涵盖了韩国人等位基因频率>0.001的大多数变体,这表明它足以覆盖大多数常见和罕见的遗传变异与韩国人常见的表型。Korea4K提供45,537,252个变体,其中一半不存在于Korea1K(1,094个样本)。我们还确定了Korea1K数据集未发现的1,356个新的基因型-表型关联。现象组学分析进一步揭示了24个显著的遗传相关性,14个多效性协会,和基于孟德尔随机化的37个性状的127个因果关系。此外,Korea4K估算参考小组,迄今为止最大的韩国变体参考,在所有等位基因频率类别中,Korea1K均表现出优异的归因性能。
结论:总的来说,Korea4K不仅提供了最大的韩国基因组数据,还提供了相应的健康检查参数和新的基因组-表型关联。大规模的病理全基因组组学数据将成为基因组-表型水平关联研究的有力集合,以发现因果标记,用于未来研究中的健康状况的预测和诊断。
BACKGROUND: Phenome-wide association studies (PheWASs) have been conducted on Asian populations, including Koreans, but many were based on chip or exome genotyping data. Such studies have limitations regarding whole genome-wide association analysis, making it crucial to have genome-to-
phenome association information with the largest possible whole genome and matched
phenome data to conduct further population-genome studies and develop health care services based on population genomics.
RESULTS: Here, we present 4,157 whole genome sequences (Korea4K) coupled with 107 health check-up parameters as the largest genomic resource of the Korean Genome Project. It encompasses most of the variants with allele frequency >0.001 in Koreans, indicating that it sufficiently covered most of the common and rare genetic variants with commonly measured phenotypes for Koreans. Korea4K provides 45,537,252 variants, and half of them were not present in Korea1K (1,094 samples). We also identified 1,356 new genotype-phenotype associations that were not found by the Korea1K dataset. Phenomics analyses further revealed 24 significant genetic correlations, 14 pleiotropic associations, and 127 causal relationships based on Mendelian randomization among 37 traits. In addition, the Korea4K imputation reference panel, the largest Korean variants reference to date, showed a superior imputation performance to Korea1K across all allele frequency categories.
CONCLUSIONS: Collectively, Korea4K provides not only the largest Korean genome data but also corresponding health check-up parameters and novel genome-
phenome associations. The large-scale pathological whole genome-wide omics data will become a powerful set for genome-
phenome level association studies to discover causal markers for the prediction and diagnosis of health conditions in future studies.