structural informatics

  • 文章类型: Journal Article
    鉴定和表征有助于蛋白质功能的结构位点对于理解生物学机制至关重要。评估疾病风险,并开发有针对性的治疗方法。然而,已知蛋白质结构的数量正在迅速超过我们对它们进行功能性注释的能力。现有的函数预测方法要么不在本地站点上运行,遭受高的假阳性或假阴性率,或需要大型特定站点的培训数据集,需要开发新的计算方法来大规模注释功能位点。我们提出了COLLAPSE(从对齐蛋白质结构环境中学习的压缩潜伏期),用于学习蛋白质位点的深度表示的框架。COLLAPSE直接作用于一个位点周围原子的3D位置,并使用同源蛋白之间的进化关系作为自我监督信号,使学习的嵌入能够隐式捕获每个站点内的结构-功能关系。我们的表示在迁移学习环境中概括了不同的任务,在标准化基准(蛋白质-蛋白质相互作用和突变稳定性)以及从Prosite数据库预测功能位点方面实现最先进的性能。我们使用COLLAPSE搜索大型蛋白质数据集的相似位点,并基于已知功能位点的数据库注释蛋白质。这些方法证明了COLLAPSE在计算上是有效的,可调,并且可以解释,为计算蛋白质分析提供通用平台。本文受版权保护。保留所有权利。
    The identification and characterization of the structural sites which contribute to protein function are crucial for understanding biological mechanisms, evaluating disease risk, and developing targeted therapies. However, the quantity of known protein structures is rapidly outpacing our ability to functionally annotate them. Existing methods for function prediction either do not operate on local sites, suffer from high false positive or false negative rates, or require large site-specific training datasets, necessitating the development of new computational methods for annotating functional sites at scale. We present COLLAPSE (Compressed Latents Learned from Aligned Protein Structural Environments), a framework for learning deep representations of protein sites. COLLAPSE operates directly on the 3D positions of atoms surrounding a site and uses evolutionary relationships between homologous proteins as a self-supervision signal, enabling learned embeddings to implicitly capture structure-function relationships within each site. Our representations generalize across disparate tasks in a transfer learning context, achieving state-of-the-art performance on standardized benchmarks (protein-protein interactions and mutation stability) and on the prediction of functional sites from the Prosite database. We use COLLAPSE to search for similar sites across large protein datasets and to annotate proteins based on a database of known functional sites. These methods demonstrate that COLLAPSE is computationally efficient, tunable, and interpretable, providing a general-purpose platform for computational protein analysis.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    真核生物使用丰富多样的跨膜G蛋白偶联受体(GPCRs)来感测物理和化学信号。仅在人类中,800个GPCR包含最大和最具治疗靶向性的受体类别。GPCR结构生物学的最新进展已经产生了数百个通过X射线衍射解决的GPCR结构,低温电子显微镜(cryo-EM)。这些结构中的许多都是通过位点特异性胆固醇结合来稳定的,但尚不清楚这些相互作用是否是胆固醇结合基序反复出现的产物,以及观察到的胆固醇结合模式是否因实验技术而异。这里,我们全面分析了当前473个人GPCR结构链中胆固醇结合位点的位置和组成.我们的发现确定了胆固醇在低温EM和X射线结构中的结合相似,并表明GPCR表面上92%的胆固醇分子位于缺乏可辨别的胆固醇结合基序的可预测位置。
    A rich diversity of transmembrane G protein-coupled receptors (GPCRs) are used by eukaryotes to sense physical and chemical signals. In humans alone, 800 GPCRs comprise the largest and most therapeutically targeted receptor class. Recent advances in GPCR structural biology have produced hundreds of GPCR structures solved by X-ray diffraction and increasingly, cryo-electron microscopy (cryo-EM). Many of these structures are stabilized by site-specific cholesterol binding, but it is unclear whether these interactions are a product of recurring cholesterol-binding motifs and if observed patterns of cholesterol binding differ by experimental technique. Here, we comprehensively analyze the location and composition of cholesterol binding sites in the current set of 473 human GPCR structural chains. Our findings establish that cholesterol binds similarly in cryo-EM and X-ray structures and show that 92% of cholesterol molecules on GPCR surfaces reside in predictable locations that lack discernable cholesterol-binding motifs.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    Advances in high-throughput molecular biology and electronic health records (EHR), coupled with increasing computer capabilities have resulted in an increased interest in the use of big data in health care. Big data require collection and analysis of data at an unprecedented scale and represents a paradigm shift in health care, offering (1) the capacity to generate new knowledge more quickly than traditional scientific approaches; (2) unbiased collection and analysis of data; and (3) a holistic understanding of biology and pathophysiology. Big data promises more personalized and precision medicine for patients with improved accuracy and earlier diagnosis, and therapy tailored to an individual\'s unique combination of genes, environmental risk, and precise disease phenotype. This promise comes from data collected from numerous sources, ranging from molecules to cells, to tissues, to individuals and populations-and the integration of these data into networks that improve understanding of heath and disease. Big data-driven science should play a role in propelling comparative medicine and \"one medicine\" (i.e., the shared physiology, pathophysiology, and disease risk factors across species) forward. Merging of data from EHR across institutions will give access to patient data on a scale previously unimaginable, allowing for precise phenotype definition and objective evaluation of risk factors and response to therapy. High-throughput molecular data will give insight into previously unexplored molecular pathophysiology and disease etiology. Investigation and integration of big data from a variety of sources will result in stronger parallels drawn at the molecular level between human and animal disease, allow for predictive modeling of infectious disease and identification of key areas of intervention, and facilitate step-changes in our understanding of disease that can make a substantial impact on animal and human health. However, the use of big data comes with significant challenges. Here we explore the scope of \"big data,\" including its opportunities, its limitations, and what is needed capitalize on big data in one medicine.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

  • 文章类型: Journal Article
    目前,需要涉及免疫细胞化学或放射性标记材料的耗时的连续体外实验来鉴定多种Rab-GTP酶(Rab)和Rab-GTP酶激活蛋白(RabGAP)中的哪一种能够功能性相互作用。这些相互作用对于许多细胞功能至关重要,减少体外试验和错误的计算机模拟方法将加快细胞生物学研究的步伐。我们已经利用三维蛋白质建模和蛋白质生物信息学的组合来鉴定Rab蛋白中存在的预测其与特定RabGAP的功能相互作用的结构域。RabF2和RabSF1结构域似乎在介导Rabs和RabGAP之间的相互作用中起功能作用。此外,RabSF1结构域可用于进行功能性Rab/RabGAP对的计算机模拟预测。预期该方法是用于预测蛋白质-蛋白质相互作用的广泛适用的工具,其中感兴趣的蛋白质的同源物的现有晶体结构是可用的。
    Currently, time-consuming serial in vitro experimentation involving immunocytochemistry or radiolabeled materials is required to identify which of the numerous Rab-GTPases (Rab) and Rab-GTPase activating proteins (RabGAP) are capable of functional interactions. These interactions are essential for numerous cellular functions, and in silico methods of reducing in vitro trial and error would accelerate the pace of research in cell biology. We have utilized a combination of three-dimensional protein modeling and protein bioinformatics to identify domains present in Rab proteins that are predictive of their functional interaction with a specific RabGAP. The RabF2 and RabSF1 domains appear to play functional roles in mediating the interaction between Rabs and RabGAPs. Moreover, the RabSF1 domain can be used to make in silico predictions of functional Rab/RabGAP pairs. This method is expected to be a broadly applicable tool for predicting protein-protein interactions where existing crystal structures for homologs of the proteins of interest are available.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

  • 文章类型: Journal Article
    A fundamental and unsolved problem in biophysical chemistry is the development of a computationally simple, physically intuitive, and generally applicable method for accurately predicting and physically explaining protein-protein binding affinities from protein-protein interaction (PPI) complex coordinates. Here, we propose that the simplification of a previously described six-term PPI scoring function to a four term function results in a simple expression of all physically and statistically meaningful terms that can be used to accurately predict and explain binding affinities for a well-defined subset of PPIs that are characterized by (1) crystallographic coordinates, (2) rigid-body association, (3) normal interface size, and hydrophobicity and hydrophilicity, and (4) high quality experimental binding affinity measurements. We further propose that the four-term scoring function could be regarded as a core expression for future development into a more general PPI scoring function. Our work has clear implications for PPI modeling and structure-based drug design.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

公众号