关键词: Dark genome region Lp(a) Medically relevant gene Nextflow UK Biobank VNTR Variable number tandem repeat Whole-exome sequencing

Mesh : Minisatellite Repeats Humans Cardiovascular Diseases / genetics Genetic Variation Sequence Analysis, DNA / methods Lipoprotein(a) / genetics Genetic Predisposition to Disease

来  源:   DOI:10.1186/s13059-024-03316-5   PDF(Pubmed)

Abstract:
Variable number tandem repeats (VNTRs) are highly polymorphic DNA regions harboring many potentially disease-causing variants. However, VNTRs often appear unresolved (\"dark\") in variation databases due to their repetitive nature. One particularly complex and medically relevant VNTR is the KIV-2 VNTR located in the cardiovascular disease gene LPA which encompasses up to 70% of the coding sequence.
Using the highly complex LPA gene as a model, we develop a computational approach to resolve intra-repeat variation in VNTRs from largely available short-read sequencing data. We apply the approach to six protein-coding VNTRs in 2504 samples from the 1000 Genomes Project and developed an optimized method for the LPA KIV-2 VNTR that discriminates the confounding KIV-2 subtypes upfront. This results in an F1-score improvement of up to 2.1-fold compared to previously published strategies. Finally, we analyze the LPA VNTR in > 199,000 UK Biobank samples, detecting > 700 KIV-2 mutations. This approach successfully reveals new strong Lp(a)-lowering effects for KIV-2 variants, with protective effect against coronary artery disease, and also validated previous findings based on tagging SNPs.
Our approach paves the way for reliable variant detection in VNTRs at scale and we show that it is transferable to other dark regions, which will help unlock medical information hidden in VNTRs.
摘要:
背景:可变数量串联重复序列(VNTR)是具有许多潜在致病变异的高度多态性DNA区域。然而,由于其重复性,VNTR通常在变异数据库中出现未解析(“暗”)。一种特别复杂和医学相关的VNTR是位于心血管疾病基因LPA中的KIV-2VNTR,其包含高达70%的编码序列。
结果:使用高度复杂的LPA基因作为模型,我们开发了一种计算方法来解决从大量可用的短阅读测序数据中VNTR的重复内变异。我们将该方法应用于来自1000基因组计划的2504个样品中的六个蛋白质编码VNTR,并开发了一种针对LPAKIV-2VNTR的优化方法,该方法可以预先区分混杂的KIV-2亚型。与先前公布的策略相比,这导致F1分数提高高达2.1倍。最后,我们分析了>199,000个英国生物库样本中的LPAVNTR,检测>700KIV-2突变。这种方法成功地揭示了新的强Lp(a)-降低KIV-2变体的作用,对冠状动脉疾病有保护作用,并基于标记SNP验证了先前的发现。
结论:我们的方法为在VNTRs中进行大规模的可靠变异检测铺平了道路,我们表明它可以转移到其他暗区,这将有助于解锁隐藏在VNTR中的医疗信息。
公众号