disease gene discovery

  • 文章类型: Journal Article
    Protein-truncating variants (PTVs) near the 3\' end of genes may escape nonsense-mediated decay (NMD). PTVs in the NMD-escape region (PTVescs) can cause Mendelian disease but are difficult to interpret given their varying impact on protein function. Previously, PTVesc burden was assessed in an epilepsy cohort, but no large-scale analysis has systematically evaluated these variants in rare disease. We performed a retrospective analysis of 29,031 neurodevelopmental disorder (NDD) parent-offspring trios referred for clinical exome sequencing to identify PTVesc de novo mutations (DNMs). We identified 1,376 PTVesc DNMs and 133 genes that were significantly enriched (binomial p < 0.001). The PTVesc-enriched genes included those with PTVescs previously described to cause dominant Mendelian disease (e.g., SEMA6B, PPM1D, and DAGLA). We annotated ClinVar variants for PTVescs and identified 948 genes with at least one high-confidence pathogenic variant. Twenty-two known Mendelian PTVesc-enriched genes had no prior evidence of PTVesc-associated disease. We found 22 additional PTVesc-enriched genes that are not well established to be associated with Mendelian disease, several of which showed phenotypic similarity between individuals harboring PTVesc variants in the same gene. Four individuals with PTVesc mutations in RAB1A had similar phenotypes including NDD and spasticity. PTVesc mutations in IRF2BP1 were found in two individuals who each had severe immunodeficiency manifesting in NDD. Three individuals with PTVesc mutations in LDB1 all had NDD and multiple congenital anomalies. Using a large-scale, systematic analysis of DNMs, we extend the mutation spectrum for known Mendelian disease-associated genes and identify potentially novel disease-associated genes.






  • 文章类型: Journal Article
    Computing phenotypic similarity helps identify new disease genes and diagnose rare diseases. Genotype-phenotype data from orthologous genes in model organisms can compensate for lack of human data and increase genome coverage. In the past decade, cross-species phenotype comparisons have proven valuble, and several ontologies have been developed for this purpose. The relative contribution of different model organisms to computational identification of disease-associated genes is not fully explored. We used phenotype ontologies to semantically relate phenotypes resulting from loss-of-function mutations in model organisms to disease-associated phenotypes in humans. Semantic machine learning methods were used to measure the contribution of different model organisms to the identification of known human gene-disease associations. We found that mouse genotype-phenotype data provided the most important dataset in the identification of human disease genes by semantic similarity and machine learning over phenotype ontologies. Other model organisms\' data did not improve identification over that obtained using the mouse alone, and therefore did not contribute significantly to this task. Our work impacts on the development of integrated phenotype ontologies, as well as for the use of model organism phenotypes in human genetic variant interpretation. This article has an associated First Person interview with the first author of the paper.






  • 文章类型: Journal Article
    The experimental and computational techniques for capturing information about protein structures and genetic variation within the human genome have advanced dramatically in the past 20 years, generating extensive new data resources. In this review, we discuss these advances, along with new approaches for determining the impact a genetic variant has on protein function. We focus on the potential of new methods that integrate human genetic variation into protein structures to discover relationships to disease, including the discovery of mutational hotspots in cancer-related proteins, the localization of protein-altering variants within protein regions for common complex diseases, and the assessment of variants of unknown significance for Mendelian traits. We expect that approaches that integratethese data sources will play increasingly important roles in disease gene discovery and variant interpretation.






  • 文章类型: Journal Article
    Although the rates of disease gene discovery have steadily increased with the expanding use of genome and exome sequencing by clinical and research laboratories, only ~16% of genes in the genome have confirmed disease associations. Here we describe our clinical laboratory\'s experience utilizing GeneMatcher, an online portal designed to promote disease gene discovery and data sharing. Since 2016, we submitted 246 candidates from 243 unique genes to GeneMatcher, of which 111 (45%) are now clinically characterized. Submissions meeting our candidate gene-reporting criteria based on a scoring system using patient and molecular-weighted evidence were significantly more likely to be characterized as of October 2021 versus genes that did not meet our clinical-reporting criteria (p = 0.025). We reported relevant findings related to these newly characterized gene-disease associations in 477 probands. In 218 (46%) instances, we issued reclassifications after an initial negative or candidate gene (uncertain) report. We coauthored 104 publications delineating gene-disease relationships, including descriptions of new associations (60%), additional supportive evidence (13%), subsequent descriptive cohorts (23%), and phenotypic expansions (4%). Clinical laboratories are pivotal for disease gene discovery efforts and can screen phenotypes based on genotype matches, contact clinicians of relevant cases, and issue proactive reclassification reports.






  • 文章类型: Journal Article
    Protein interaction networks provide a powerful framework for identifying genes causal for complex genetic diseases. Here, we introduce a general framework, uKIN, that uses prior knowledge of disease-associated genes to guide, within known protein-protein interaction networks, random walks that are initiated from newly identified candidate genes. In large-scale testing across 24 cancer types, we demonstrate that our network propagation approach for integrating both prior and new information not only better identifies cancer driver genes than using either source of information alone but also readily outperforms other state-of-the-art network-based approaches. We also apply our approach to genome-wide association data to identify genes functionally relevant for several complex diseases. Overall, our work suggests that guided network propagation approaches that utilize both prior and new data are a powerful means to identify disease genes. uKIN is freely available for download at: https://github.com/Singh-Lab/uKIN.







  • 文章类型: Journal Article
    Identifying new gene functions and pathways underlying diseases and biological processes are major challenges in genomics research. Particularly, most methods for interpreting the pathways characteristic of an experimental gene list defined by genomic data are limited by their dependence on assessing the overlapping genes or their interactome topology, which cannot account for the variety of functional relations. This is particularly problematic for pathway discovery from single-cell genomics with low gene coverage or interpreting complex pathway changes such as during change of cell states. Here, we exploited the comprehensive sets of molecular concepts that combine ontologies, pathways, interactions and domains to help inform the functional relations. We first developed a universal concept signature (uniConSig) analysis for genome-wide quantification of new gene functions underlying biological or pathological processes based on the signature molecular concepts computed from known functional gene lists. We then further developed a novel concept signature enrichment analysis (CSEA) for deep functional assessment of the pathways enriched in an experimental gene list. This method is grounded on the framework of shared concept signatures between gene sets at multiple functional levels, thus overcoming the limitations of the current methods. Through meta-analysis of transcriptomic data sets of cancer cell line models and single hematopoietic stem cells, we demonstrate the broad applications of CSEA on pathway discovery from gene expression and single-cell transcriptomic data sets for genetic perturbations and change of cell states, which complements the current modalities. The R modules for uniConSig analysis and CSEA are available through https://github.com/wangxlab/uniConSig.







  • 文章类型: Journal Article
    UNASSIGNED: Alzheimer\'s disease (AD) is a severe neurodegenerative disorder and has become a global public health problem. Intensive research has been conducted for AD. But the pathophysiology of AD is still not elucidated. Disease comorbidity often associates diseases with overlapping patterns of genetic markers. This may inform a common etiology and suggest essential protein targets. US Food and Drug Administration (FDA) Adverse Event Reporting System (FAERS) collects large-scale postmarketing surveillance data that provide a unique opportunity to investigate disease co-occurrence pattern. We aim to construct a heterogeneous network that integrates disease comorbidity network (DCN) from FAERS with protein-protein interaction (PPI) to prioritize the AD risk genes using network-based ranking algorithm.
    UNASSIGNED: We built a DCN based on indication data from FAERS using association rule mining. DCN was further integrated with PPI network. We used random walk with restart ranking algorithm to prioritize AD risk genes.
    UNASSIGNED: We evaluated the performance of our approach using AD risk genes curated from genetic association studies. Our approach achieved an area under a receiver operating characteristic curve of 0.770. Top 500 ranked genes achieved 5.53-fold enrichment for known AD risk genes as compared to random expectation. Pathway enrichment analysis using top-ranked genes revealed that two novel pathways, ERBB and coagulation pathways, might be involved in AD pathogenesis.
    UNASSIGNED: We innovatively leveraged FAERS, a comprehensive data resource for FDA postmarket drug safety surveillance, for large-scale AD comorbidity mining. This exploratory study demonstrated the potential of disease-comorbidities mining from FAERS in AD genetics discovery.







  • 文章类型: Journal Article
    Online Mendelian Inheritance in Man (OMIM) at OMIM.org is the primary repository of comprehensive, curated information on genes and genetic phenotypes and the relationships between them. This unit provides an overview of the types of information in OMIM and optimal strategies for searching and retrieving the information. OMIM.org has links to many related and complementary databases, providing easy access to more information on a topic. The relationship between genes and genetic disorders is highlighted in this unit. The basic protocol explains searching OMIM both from a gene perspective and a clinical features perspective. Two alternate protocols provide strategies for viewing gene-phenotype relationships: a gene map table and Quick View or Side-by-Side format for clinical features. OMIM.org is updated nightly, and the MIMmatch service, described in the support protocol, provides a convenient way to follow updates to entries, gene-phenotype relationships, and collaborate with other researchers. © 2017 by John Wiley & Sons, Inc.






  • 文章类型: Journal Article
    Whole-genome sequencing (WGS) studies are uncovering disease-associated variants in both rare and nonrare diseases. Utilizing the next-generation sequencing for WGS requires a series of computational methods for alignment, variant detection, and annotation, and the accuracy and reproducibility of annotation results are essential for clinical implementation. However, annotating WGS with up to date genomic information is still challenging for biomedical researchers. Here, we present one of the fastest and highly scalable annotation, filtering, and analysis pipeline-gNOME-to prioritize phenotype-associated variants while minimizing false-positive findings. Intuitive graphical user interface of gNOME facilitates the selection of phenotype-associated variants, and the result summaries are provided at variant, gene, and genome levels. Moreover, the enrichment results of specific variants, genes, and gene sets between two groups or compared with population scale WGS datasets that is already integrated in the pipeline can help the interpretation. We found a small number of discordant results between annotation software tools in part due to different reporting strategies for the variants with complex impacts. Using two published whole-exome datasets of uveal melanoma and bladder cancer, we demonstrated gNOME\'s accuracy of variant annotation and the enrichment of loss-of-function variants in known cancer pathways. gNOME Web server and source codes are freely available to the academic community (http://gnome.tchlab.org).





