背景:严重改变蛋白质产物的遗传变异(例如无义,移码)通常与疾病相关。对于一些基因,在整个基因中观察到这些预测的功能丧失变体(pLoFs),而在其他方面,它们只发生在特定的地点。我们假设,对于与显示不完全外显率的单基因疾病相关的基因,存在于明显未受影响的个体中的pLoF变体可能限于耐受pLoF的区域。为了测试这个,我们调查了pLoF位置是否可以解释预期对孟德尔疾病具有致病性的变体的不完全外显率的情况。
方法:我们使用了英国生物库(UKB)454,773名个体的外显子组序列数据来调查pLoF在人群队列中的位置。我们统计了独特pLoF的数量,错觉,和所有蛋白质编码基因的编码序列(CDS)的每个五分之一的UKB中的同义变体,并使用高斯混合模型对变体进行聚类。我们将分析限于每种类型具有≥5个变体的基因(16,473个基因)。我们将UKB中pLoF的位置与转录本中所有理论上可能的pLoF进行了比较,和来自ClinVar的致病性pLoFs,并进行了模拟以估计非均匀分布变体的假阳性率。
结果:对于大多数基因,所有变体类别都属于代表广泛均匀变体分布的集群,但是单倍体不足导致发育障碍的基因与其他基因相比,pLoF分布均匀的可能性较小(P<2.2×10-6)。我们发现了一些基因,包括ARID1B和GATA6,其中CDS第一季度的pLoF变异体因存在替代翻译起始位点而被挽救,不应被报告为致病性.对于其他基因,例如ODC1,pLoFs在整个基因中大致均匀地定位,但是致病性pLoFs只在最后聚集,与功能获得疾病机制一致。
结论:我们的结果提示了局部约束指标的潜在益处,并且在解释变异时应考虑pLoF变异的位置。
BACKGROUND: Genetic variants that severely alter protein products (e.g. nonsense, frameshift) are often associated with disease. For some genes, these predicted loss-of-function variants (pLoFs) are observed throughout the gene, whilst in others, they occur only at specific locations. We hypothesised that, for genes linked with monogenic diseases that display incomplete penetrance, pLoF variants present in apparently unaffected individuals may be limited to regions where pLoFs are tolerated. To test this, we investigated whether pLoF location could explain instances of incomplete penetrance of variants expected to be pathogenic for Mendelian conditions.
METHODS: We used exome sequence data in 454,773 individuals in the UK Biobank (UKB) to investigate the locations of pLoFs in a population cohort. We counted numbers of unique pLoF, missense, and synonymous variants in UKB in each quintile of the coding sequence (CDS) of all protein-coding genes and clustered the variants using Gaussian mixture models. We limited the analyses to genes with ≥ 5 variants of each type (16,473 genes). We compared the locations of pLoFs in UKB with all theoretically possible pLoFs in a transcript, and pathogenic pLoFs from ClinVar, and performed simulations to estimate the false-positive rate of non-uniformly distributed variants.
RESULTS: For most genes, all variant classes fell into clusters representing broadly uniform variant distributions, but genes in which haploinsufficiency causes developmental disorders were less likely to have uniform pLoF distribution than other genes (P < 2.2 × 10-6). We identified a number of genes, including ARID1B and GATA6, where pLoF variants in the first quarter of the CDS were rescued by the presence of an alternative translation start site and should not be reported as pathogenic. For other genes, such as ODC1, pLoFs were located approximately uniformly across the gene, but pathogenic pLoFs were clustered only at the end, consistent with a gain-of-function disease mechanism.
CONCLUSIONS: Our results suggest the potential benefits of localised constraint metrics and that the location of pLoF variants should be considered when interpreting variants.