disease gene discovery

  • 文章类型: Journal Article
    基因3'末端附近的蛋白质截断变体(PTV)可能会逃脱无义介导的衰变(NMD)。NMD逃逸区(PTVess)中的PTV可引起孟德尔病,但由于它们对蛋白质功能的不同影响,因此难以解释。以前,在癫痫队列中评估PTVesc负担,但是没有大规模的分析在罕见疾病中系统地评估这些变异。我们对29,031个神经发育障碍(NDD)亲本-后代三重奏进行了回顾性分析,用于临床外显子组测序,以鉴定PTVesc新生突变(DNM)。我们鉴定了1,376个PTVescDNM和133个显著富集的基因(二项p<0.001)。富含PTVesc的基因包括先前描述的导致显性孟德尔疾病的PTVesc基因(例如,SEMA6B,PPM1D,和DAGLA)。我们注释了PTVescs的ClinVar变体,并鉴定了948个具有至少一个高置信度致病性变体的基因。22个已知的富含孟德尔PTVesc的基因没有PTVesc相关疾病的先前证据。我们发现了另外22个富含PTVesc的基因,这些基因与孟德尔疾病相关,其中一些显示在同一基因中携带PTVesc变体的个体之间的表型相似性。RAB1A中具有PTVesc突变的四个个体具有相似的表型,包括NDD和痉挛。在两个个体中发现了IRF2BP1中的PTVesc突变,每个个体在NDD中表现出严重的免疫缺陷。LDB1中PTVesc突变的三名个体均患有NDD和多种先天性异常。使用大规模,对DNM的系统分析,我们扩展了已知孟德尔疾病相关基因的突变谱,并鉴定了潜在的新疾病相关基因.
    Protein-truncating variants (PTVs) near the 3\' end of genes may escape nonsense-mediated decay (NMD). PTVs in the NMD-escape region (PTVescs) can cause Mendelian disease but are difficult to interpret given their varying impact on protein function. Previously, PTVesc burden was assessed in an epilepsy cohort, but no large-scale analysis has systematically evaluated these variants in rare disease. We performed a retrospective analysis of 29,031 neurodevelopmental disorder (NDD) parent-offspring trios referred for clinical exome sequencing to identify PTVesc de novo mutations (DNMs). We identified 1,376 PTVesc DNMs and 133 genes that were significantly enriched (binomial p < 0.001). The PTVesc-enriched genes included those with PTVescs previously described to cause dominant Mendelian disease (e.g., SEMA6B, PPM1D, and DAGLA). We annotated ClinVar variants for PTVescs and identified 948 genes with at least one high-confidence pathogenic variant. Twenty-two known Mendelian PTVesc-enriched genes had no prior evidence of PTVesc-associated disease. We found 22 additional PTVesc-enriched genes that are not well established to be associated with Mendelian disease, several of which showed phenotypic similarity between individuals harboring PTVesc variants in the same gene. Four individuals with PTVesc mutations in RAB1A had similar phenotypes including NDD and spasticity. PTVesc mutations in IRF2BP1 were found in two individuals who each had severe immunodeficiency manifesting in NDD. Three individuals with PTVesc mutations in LDB1 all had NDD and multiple congenital anomalies. Using a large-scale, systematic analysis of DNMs, we extend the mutation spectrum for known Mendelian disease-associated genes and identify potentially novel disease-associated genes.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    计算表型相似性有助于识别新的疾病基因和诊断罕见疾病。来自模型生物中直系同源基因的基因型-表型数据可以弥补人类数据的缺乏并增加基因组覆盖率。在过去的十年里,跨物种表型比较已被证明是有价值的,并为此开发了几种本体论。尚未充分探索不同模型生物对疾病相关基因的计算鉴定的相对贡献。我们使用表型本体在语义上将模型生物中功能丧失突变导致的表型与人类疾病相关表型联系起来。使用语义机器学习方法来测量不同模型生物对识别已知人类基因-疾病关联的贡献。我们发现,小鼠基因型-表型数据通过语义相似性和表型本体的机器学习提供了人类疾病基因识别中最重要的数据集。其他模型生物的数据没有比单独使用小鼠获得的数据更好地识别,因此对这项任务没有重大贡献。我们的工作对整合表型本体的发展产生了影响,以及在人类遗传变异解释中使用模式生物表型。本文与该论文的第一作者进行了相关的第一人称访谈。
    Computing phenotypic similarity helps identify new disease genes and diagnose rare diseases. Genotype-phenotype data from orthologous genes in model organisms can compensate for lack of human data and increase genome coverage. In the past decade, cross-species phenotype comparisons have proven valuble, and several ontologies have been developed for this purpose. The relative contribution of different model organisms to computational identification of disease-associated genes is not fully explored. We used phenotype ontologies to semantically relate phenotypes resulting from loss-of-function mutations in model organisms to disease-associated phenotypes in humans. Semantic machine learning methods were used to measure the contribution of different model organisms to the identification of known human gene-disease associations. We found that mouse genotype-phenotype data provided the most important dataset in the identification of human disease genes by semantic similarity and machine learning over phenotype ontologies. Other model organisms\' data did not improve identification over that obtained using the mouse alone, and therefore did not contribute significantly to this task. Our work impacts on the development of integrated phenotype ontologies, as well as for the use of model organism phenotypes in human genetic variant interpretation. This article has an associated First Person interview with the first author of the paper.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    在过去的20年中,捕获有关人类基因组中蛋白质结构和遗传变异信息的实验和计算技术取得了巨大的进步,生成大量新的数据资源。在这次审查中,我们讨论这些进步,以及确定遗传变异对蛋白质功能影响的新方法。我们专注于将人类遗传变异整合到蛋白质结构中以发现与疾病的关系的新方法的潜力,包括癌症相关蛋白质中突变热点的发现,常见复杂疾病的蛋白质区域内的蛋白质改变变体的定位,以及孟德尔性状意义未知的变异评估。我们期望整合这些数据源的方法将在疾病基因发现和变异解释中发挥越来越重要的作用。
    The experimental and computational techniques for capturing information about protein structures and genetic variation within the human genome have advanced dramatically in the past 20 years, generating extensive new data resources. In this review, we discuss these advances, along with new approaches for determining the impact a genetic variant has on protein function. We focus on the potential of new methods that integrate human genetic variation into protein structures to discover relationships to disease, including the discovery of mutational hotspots in cancer-related proteins, the localization of protein-altering variants within protein regions for common complex diseases, and the assessment of variants of unknown significance for Mendelian traits. We expect that approaches that integratethese data sources will play increasingly important roles in disease gene discovery and variant interpretation.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    尽管随着临床和研究实验室对基因组和外显子组测序的广泛使用,疾病基因发现率稳步上升,基因组中只有约16%的基因证实了疾病相关。在这里,我们描述了我们的临床实验室利用GeneMatcher的经验,旨在促进疾病基因发现和数据共享的在线门户。自2016年以来,我们向GeneMatcher提交了来自243个独特基因的246个候选基因,其中111(45%)现在是临床特征。与不符合我们的临床报告标准的基因相比,根据使用患者和分子加权证据的评分系统,满足我们的候选基因报告标准的提交更有可能在2021年10月被表征(p=0.025)。我们报告了与477先证者中这些新特征的基因疾病相关的相关发现。在218例(46%)中,我们在最初的阴性或候选基因(不确定)报告后发布了重新分类.我们共同撰写了104篇描述基因与疾病关系的出版物,包括对新协会的描述(60%),额外的支持证据(13%),随后的描述性队列(23%),和表型扩展(4%)。临床实验室对于疾病基因发现工作至关重要,可以根据基因型匹配筛选表型,联系相关病例的临床医生,并发布主动重新分类报告。
    Although the rates of disease gene discovery have steadily increased with the expanding use of genome and exome sequencing by clinical and research laboratories, only ~16% of genes in the genome have confirmed disease associations. Here we describe our clinical laboratory\'s experience utilizing GeneMatcher, an online portal designed to promote disease gene discovery and data sharing. Since 2016, we submitted 246 candidates from 243 unique genes to GeneMatcher, of which 111 (45%) are now clinically characterized. Submissions meeting our candidate gene-reporting criteria based on a scoring system using patient and molecular-weighted evidence were significantly more likely to be characterized as of October 2021 versus genes that did not meet our clinical-reporting criteria (p = 0.025). We reported relevant findings related to these newly characterized gene-disease associations in 477 probands. In 218 (46%) instances, we issued reclassifications after an initial negative or candidate gene (uncertain) report. We coauthored 104 publications delineating gene-disease relationships, including descriptions of new associations (60%), additional supportive evidence (13%), subsequent descriptive cohorts (23%), and phenotypic expansions (4%). Clinical laboratories are pivotal for disease gene discovery efforts and can screen phenotypes based on genotype matches, contact clinicians of relevant cases, and issue proactive reclassification reports.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    蛋白质相互作用网络为识别复杂遗传疾病的基因提供了强大的框架。这里,我们介绍一个通用框架,uKIN,利用疾病相关基因的先验知识来指导,在已知的蛋白质-蛋白质相互作用网络中,从新确定的候选基因开始的随机游走。在24种癌症的大规模检测中,我们证明,我们整合先前和新信息的网络传播方法不仅比单独使用任一信息源更好地识别癌症驱动基因,而且很容易优于其他最先进的基于网络的方法.我们还将我们的方法应用于全基因组关联数据,以识别与几种复杂疾病功能相关的基因。总的来说,我们的工作表明,利用先前和新数据的引导网络传播方法是识别疾病基因的有力手段。uKIN可免费下载:https://github.com/Singh-Lab/uKIN。
    Protein interaction networks provide a powerful framework for identifying genes causal for complex genetic diseases. Here, we introduce a general framework, uKIN, that uses prior knowledge of disease-associated genes to guide, within known protein-protein interaction networks, random walks that are initiated from newly identified candidate genes. In large-scale testing across 24 cancer types, we demonstrate that our network propagation approach for integrating both prior and new information not only better identifies cancer driver genes than using either source of information alone but also readily outperforms other state-of-the-art network-based approaches. We also apply our approach to genome-wide association data to identify genes functionally relevant for several complex diseases. Overall, our work suggests that guided network propagation approaches that utilize both prior and new data are a powerful means to identify disease genes. uKIN is freely available for download at: https://github.com/Singh-Lab/uKIN.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    Identifying new gene functions and pathways underlying diseases and biological processes are major challenges in genomics research. Particularly, most methods for interpreting the pathways characteristic of an experimental gene list defined by genomic data are limited by their dependence on assessing the overlapping genes or their interactome topology, which cannot account for the variety of functional relations. This is particularly problematic for pathway discovery from single-cell genomics with low gene coverage or interpreting complex pathway changes such as during change of cell states. Here, we exploited the comprehensive sets of molecular concepts that combine ontologies, pathways, interactions and domains to help inform the functional relations. We first developed a universal concept signature (uniConSig) analysis for genome-wide quantification of new gene functions underlying biological or pathological processes based on the signature molecular concepts computed from known functional gene lists. We then further developed a novel concept signature enrichment analysis (CSEA) for deep functional assessment of the pathways enriched in an experimental gene list. This method is grounded on the framework of shared concept signatures between gene sets at multiple functional levels, thus overcoming the limitations of the current methods. Through meta-analysis of transcriptomic data sets of cancer cell line models and single hematopoietic stem cells, we demonstrate the broad applications of CSEA on pathway discovery from gene expression and single-cell transcriptomic data sets for genetic perturbations and change of cell states, which complements the current modalities. The R modules for uniConSig analysis and CSEA are available through https://github.com/wangxlab/uniConSig.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    UNASSIGNED: Alzheimer\'s disease (AD) is a severe neurodegenerative disorder and has become a global public health problem. Intensive research has been conducted for AD. But the pathophysiology of AD is still not elucidated. Disease comorbidity often associates diseases with overlapping patterns of genetic markers. This may inform a common etiology and suggest essential protein targets. US Food and Drug Administration (FDA) Adverse Event Reporting System (FAERS) collects large-scale postmarketing surveillance data that provide a unique opportunity to investigate disease co-occurrence pattern. We aim to construct a heterogeneous network that integrates disease comorbidity network (DCN) from FAERS with protein-protein interaction (PPI) to prioritize the AD risk genes using network-based ranking algorithm.
    UNASSIGNED: We built a DCN based on indication data from FAERS using association rule mining. DCN was further integrated with PPI network. We used random walk with restart ranking algorithm to prioritize AD risk genes.
    UNASSIGNED: We evaluated the performance of our approach using AD risk genes curated from genetic association studies. Our approach achieved an area under a receiver operating characteristic curve of 0.770. Top 500 ranked genes achieved 5.53-fold enrichment for known AD risk genes as compared to random expectation. Pathway enrichment analysis using top-ranked genes revealed that two novel pathways, ERBB and coagulation pathways, might be involved in AD pathogenesis.
    UNASSIGNED: We innovatively leveraged FAERS, a comprehensive data resource for FDA postmarket drug safety surveillance, for large-scale AD comorbidity mining. This exploratory study demonstrated the potential of disease-comorbidities mining from FAERS in AD genetics discovery.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    Online Mendelian Inheritance in Man (OMIM) at OMIM.org is the primary repository of comprehensive, curated information on genes and genetic phenotypes and the relationships between them. This unit provides an overview of the types of information in OMIM and optimal strategies for searching and retrieving the information. OMIM.org has links to many related and complementary databases, providing easy access to more information on a topic. The relationship between genes and genetic disorders is highlighted in this unit. The basic protocol explains searching OMIM both from a gene perspective and a clinical features perspective. Two alternate protocols provide strategies for viewing gene-phenotype relationships: a gene map table and Quick View or Side-by-Side format for clinical features. OMIM.org is updated nightly, and the MIMmatch service, described in the support protocol, provides a convenient way to follow updates to entries, gene-phenotype relationships, and collaborate with other researchers. © 2017 by John Wiley & Sons, Inc.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    Whole-genome sequencing (WGS) studies are uncovering disease-associated variants in both rare and nonrare diseases. Utilizing the next-generation sequencing for WGS requires a series of computational methods for alignment, variant detection, and annotation, and the accuracy and reproducibility of annotation results are essential for clinical implementation. However, annotating WGS with up to date genomic information is still challenging for biomedical researchers. Here, we present one of the fastest and highly scalable annotation, filtering, and analysis pipeline-gNOME-to prioritize phenotype-associated variants while minimizing false-positive findings. Intuitive graphical user interface of gNOME facilitates the selection of phenotype-associated variants, and the result summaries are provided at variant, gene, and genome levels. Moreover, the enrichment results of specific variants, genes, and gene sets between two groups or compared with population scale WGS datasets that is already integrated in the pipeline can help the interpretation. We found a small number of discordant results between annotation software tools in part due to different reporting strategies for the variants with complex impacts. Using two published whole-exome datasets of uveal melanoma and bladder cancer, we demonstrated gNOME\'s accuracy of variant annotation and the enrichment of loss-of-function variants in known cancer pathways. gNOME Web server and source codes are freely available to the academic community (http://gnome.tchlab.org).
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

公众号