关键词: Genome-wide association study Gestational age Imputation Preterm birth Single-nucleotide polymorphisms Ultra-low-coverage whole genome sequencing

Mesh : Infant, Newborn Female Humans Genome-Wide Association Study Gestational Age Premature Birth Polymorphism, Single Nucleotide Genetic Testing Genotype Quantitative Trait Loci

来  源:   DOI:10.1186/s13073-023-01158-7

Abstract:
Very low-coverage (0.1 to 1×) whole genome sequencing (WGS) has become a promising and affordable approach to discover genomic variants of human populations for genome-wide association study (GWAS). To support genetic screening using preimplantation genetic testing (PGT) in a large population, the sequencing coverage goes below 0.1× to an ultra-low level. However, the feasibility and effectiveness of ultra-low-coverage WGS (ulcWGS) for GWAS remains undetermined.
We built a pipeline to carry out analysis of ulcWGS data for GWAS. To examine its effectiveness, we benchmarked the accuracy of genotype imputation at the combination of different coverages below 0.1× and sample sizes from 2000 to 16,000, using 17,844 embryo PGT samples with approximately 0.04× average coverage and the standard Chinese sample HG005 with known genotypes. We then applied the imputed genotypes of 1744 transferred embryos who have gestational ages and complete follow-up records to GWAS.
The accuracy of genotype imputation under ultra-low coverage can be improved by increasing the sample size and applying a set of filters. From 1744 born embryos, we identified 11 genomic risk loci associated with gestational ages and 166 genes mapped to these loci according to positional, expression quantitative trait locus, and chromatin interaction strategies. Among these mapped genes, CRHBP, ICAM1, and OXTR were more frequently reported as preterm birth related. By joint analysis of gene expression data from previous studies, we constructed interrelationships of mainly CRHBP, ICAM1, PLAGL1, DNMT1, CNTLN, DKK1, and EGR2 with preterm birth, infant disease, and breast cancer.
This study not only demonstrates that ulcWGS could achieve relatively high accuracy of adequate genotype imputation and is capable of GWAS, but also provides insights into the associations between gestational age and genetic variations of the fetal embryos from Chinese population.
摘要:
背景:极低覆盖率(0.1至1倍)的全基因组测序(WGS)已成为一种有希望且负担得起的方法,用于发现人类群体的基因组变异以进行全基因组关联研究(GWAS)。为了支持在大量人群中使用植入前遗传检测(PGT)进行遗传筛查,测序覆盖率低于0.1倍,达到超低水平。然而,超低覆盖率WGS(ulcWGS)用于GWAS的可行性和有效性仍不确定.
方法:我们构建了一个管道来对GWAS的ulcWGS数据进行分析。为了检查其有效性,我们使用平均覆盖率约为0.04倍的17,844个胚胎PGT样本和已知基因型的标准中国样本HG005,在低于0.1倍的不同覆盖率和2000至16,000个样本量的组合下,对基因型填补的准确性进行了基准测试。然后,我们将1744个具有胎龄和完整随访记录的移植胚胎的估算基因型应用于GWAS。
结果:可以通过增加样本量和应用一组过滤器来提高超低覆盖率下基因型归因的准确性。从1744年出生的胚胎开始,我们确定了11个与胎龄相关的基因组风险位点和166个根据位置定位到这些位点的基因,表达数量性状基因座,和染色质相互作用策略。在这些映射的基因中,CRHBP,ICAM1和OXTR更经常被报道为早产相关。通过对以往研究的基因表达数据的联合分析,我们构建了主要是CRHBP的相互关系,ICAM1,PLAGL1,DNMT1,CNTLN,DKK1和EGR2伴早产,婴儿疾病,和乳腺癌。
结论:这项研究不仅表明ulcWGS可以达到相对较高的准确性,适当的基因型填补,而且还提供了有关胎龄与中国人群胎儿胚胎遗传变异之间关系的见解。
公众号