protein structure prediction

蛋白质结构预测
  • 文章类型: Journal Article
    内在无序蛋白质的识别及其功能作用在很大程度上取决于计算预测因子的性能,在这些工具中需要高的精度标准。在这种情况下,我们介绍了一系列新颖的计算预测因子,被称为PDFll(来自生命语言的蛋白质的紊乱和功能的预测因子),它们旨在提供基于蛋白质序列的蛋白质紊乱和相关功能作用的精确预测。PDFll是通过两步过程开发的。最初,它利用了大规模蛋白质语言模型(pLMs),在包含数十亿蛋白质序列的广泛数据集上训练。随后,从pLM导出的嵌入被集成到流线型中,然而复杂,深度学习模型来生成预测。这些预测明显超过了现有最先进的预测指标的表现,特别是那些在不利用进化信息的情况下预测疾病和功能的人。
    The identification of intrinsically disordered proteins and their functional roles is largely dependent on the performance of computational predictors, necessitating a high standard of accuracy in these tools. In this context, we introduce a novel series of computational predictors, termed PDFll (Predictors of Disorder and Function of proteins from the Language of Life), which are designed to offer precise predictions of protein disorder and associated functional roles based on protein sequences. PDFll is developed through a two-step process. Initially, it leverages large-scale protein language models (pLMs), trained on an extensive dataset comprising billions of protein sequences. Subsequently, the embeddings derived from pLMs are integrated into streamlined, yet sophisticated, deep-learning models to generate predictions. These predictions notably surpass the performance of existing state-of-the-art predictors, particularly those that forecast disorder and function without utilizing evolutionary information.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    来自蛋白质语言模型(pLM)的嵌入正在取代来自多个序列比对(MSA)的进化信息,作为蛋白质预测的最成功输入。这是因为嵌入捕获了进化信息吗?我们测试了各种方法,以明确地将进化信息纳入各种蛋白质预测任务的嵌入中。而较旧的pLM(SeqVec,ProtBert)通过MSA显著改进,最近的pLMProtT5没有受益。对于大多数任务,基于pLM的方法优于基于MSA的方法,两者的结合甚至降低了某些(内在障碍)的性能。我们强调了基于pLM的方法的有效性,并发现集成MSA的好处有限。
    Embeddings from protein Language Models (pLMs) are replacing evolutionary information from multiple sequence alignments (MSAs) as the most successful input for protein prediction. Is this because embeddings capture evolutionary information? We tested various approaches to explicitly incorporate evolutionary information into embeddings on various protein prediction tasks. While older pLMs (SeqVec, ProtBert) significantly improved through MSAs, the more recent pLM ProtT5 did not benefit. For most tasks, pLM-based outperformed MSA-based methods, and the combination of both even decreased performance for some (intrinsic disorder). We highlight the effectiveness of pLM-based methods and find limited benefits from integrating MSAs.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    蛋白质三级结构的精确模型现在可以从许多先进的预测方法中获得,尽管每种方法的准确性通常取决于特定的蛋白质靶标。此外,许多模型可能仍然包含重大的局部错误。因此,可靠,独立的模型质量估计对于识别错误和选择最好的模型进行进一步的生物学研究都至关重要。ModFOLD9是一个领先的独立服务器,用于检测由任何方法产生的模型中的局部错误,它可以从多种替代方法中准确区分高质量的模型。ModFOLD9结合了基于深度学习的方法的几个新分数,与服务器的早期版本相比,大大提高了预测准确性。ModFOLD9是连续独立基准,它被证明与其他公共服务器竞争激烈。ModFOLD9可在https://www上免费获得。阅读。AC.英国/bioinf/ModFOLD/。
    Accurate models of protein tertiary structures are now available from numerous advanced prediction methods, although the accuracy of each method often varies depending on the specific protein target. Additionally, many models may still contain significant local errors. Therefore, reliable, independent model quality estimates are essential both for identifying errors and selecting the very best models for further biological investigations. ModFOLD9 is a leading independent server for detecting the local errors in models produced by any method, and it can accurately discriminate between high-quality models from multiple alternative approaches. ModFOLD9 incorporates several new scores from deep learning-based approaches, leading to greatly improved prediction accuracy compared with earlier versions of the server. ModFOLD9 is continuously independently benchmarked, and it is shown to be highly competitive with other public servers. ModFOLD9 is freely available at https://www.reading.ac.uk/bioinf/ModFOLD/.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    G蛋白偶联受体(GPCRs)通过将信号从细胞外环境传递到细胞内部,在细胞功能中起着至关重要的作用。它们介导各种刺激的影响,包括荷尔蒙,神经递质,离子,光子,食物促进剂和气味剂,是著名的药物靶标。结构生物学技术的进步,包括X射线晶体学和低温电子显微镜(cryo-EM),已经推动了越来越多的GPCR结构的阐明。这些结构揭示了揭示受体激活的新特征,二聚和低聚,正构调制和变构调制之间的二分法,以及信号转导背后的复杂相互作用,提供对不同配体结合模式和信号通路的见解。然而,GPCR库的很大一部分及其激活状态在结构上仍未被探索。未来的努力应优先考虑在多个维度上捕获GPCRs的全部结构多样性。要做到这一点,结构生物学与生物物理和计算技术的整合将是必不可少的。我们在这篇综述中描述了核磁共振(NMR)检查GPCR可塑性和构象动力学的进展,原子力显微镜(AFM)探索GPCRs的时空动力学和动力学方面,以及最近在用于蛋白质结构预测的人工智能方面的突破,以表征整个GPCRome的结构。总之,这篇综述提供的GPCR结构生物学之旅说明了我们在解码这些必需蛋白质的结构和功能方面取得了多大的进展.展望未来,整合尖端的生物物理学和计算工具为导航GPCR结构景观提供了一条途径,最终推进基于GPCR的应用。
    G protein-coupled receptors (GPCRs) play a crucial role in cell function by transducing signals from the extracellular environment to the inside of the cell. They mediate the effects of various stimuli, including hormones, neurotransmitters, ions, photons, food tastants and odorants, and are renowned drug targets. Advancements in structural biology techniques, including X-ray crystallography and cryo-electron microscopy (cryo-EM), have driven the elucidation of an increasing number of GPCR structures. These structures reveal novel features that shed light on receptor activation, dimerization and oligomerization, dichotomy between orthosteric and allosteric modulation, and the intricate interactions underlying signal transduction, providing insights into diverse ligand-binding modes and signalling pathways. However, a substantial portion of the GPCR repertoire and their activation states remain structurally unexplored. Future efforts should prioritize capturing the full structural diversity of GPCRs across multiple dimensions. To do so, the integration of structural biology with biophysical and computational techniques will be essential. We describe in this review the progress of nuclear magnetic resonance (NMR) to examine GPCR plasticity and conformational dynamics, of atomic force microscopy (AFM) to explore the spatial-temporal dynamics and kinetic aspects of GPCRs, and the recent breakthroughs in artificial intelligence for protein structure prediction to characterize the structures of the entire GPCRome. In summary, the journey through GPCR structural biology provided in this review illustrates how far we have come in decoding these essential proteins architecture and function. Looking ahead, integrating cutting-edge biophysics and computational tools offers a path to navigating the GPCR structural landscape, ultimately advancing GPCR-based applications.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    蛋白质,作为生理活动的主要执行者,是疾病诊断和治疗的关键因素。研究它们的结构,功能,和相互作用对于更好地了解疾病机制和潜在的治疗方法至关重要。DeepMind的AlphaFold2,一种深度学习蛋白质结构预测模型,已经证明非常准确,它广泛应用于诊断研究的各个方面,比如疾病生物标志物的研究,微生物致病性,抗原-抗体结构,和错义突变。因此,AlphaFold2是一种特殊的工具,可以将基础蛋白质研究与疾病诊断的突破联系起来。诊断策略的发展,以及新型治疗方法的设计和精准医学的增强。这篇综述概述了建筑,亮点,和AlphaFold2的局限性,特别强调其在免疫学等学科的诊断研究中的应用,生物化学,分子生物学,和微生物学。
    Proteins, as the primary executors of physiological activity, serve as a key factor in disease diagnosis and treatment. Research into their structures, functions, and interactions is essential to better understand disease mechanisms and potential therapies. DeepMind\'s AlphaFold2, a deep-learning protein structure prediction model, has proven to be remarkably accurate, and it is widely employed in various aspects of diagnostic research, such as the study of disease biomarkers, microorganism pathogenicity, antigen-antibody structures, and missense mutations. Thus, AlphaFold2 serves as an exceptional tool to bridge fundamental protein research with breakthroughs in disease diagnosis, developments in diagnostic strategies, and the design of novel therapeutic approaches and enhancements in precision medicine. This review outlines the architecture, highlights, and limitations of AlphaFold2, placing particular emphasis on its applications within diagnostic research grounded in disciplines such as immunology, biochemistry, molecular biology, and microbiology.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    从AlphaFold最初发布两年后,我们已经看到它作为一种结构预测工具被广泛采用。这里,我们讨论了一些基于AlphaFold的最新作品,特别关注其在结构生物学社区中的使用。这包括加速结构确定本身的用例,实现新的计算研究,并构建新的工具和工作流程。我们还研究了AlphaFold正在进行的验证,因为它的预测继续与大量实验结构进行比较,以进一步描绘模型的能力和局限性。
    Two years on from the initial release of AlphaFold, we have seen its widespread adoption as a structure prediction tool. Here, we discuss some of the latest work based on AlphaFold, with a particular focus on its use within the structural biology community. This encompasses use cases like speeding up structure determination itself, enabling new computational studies, and building new tools and workflows. We also look at the ongoing validation of AlphaFold, as its predictions continue to be compared against large numbers of experimental structures to further delineate the model\'s capabilities and limitations.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    人工智能(AI)的最新进展加速了对未知蛋白质结构的预测。然而,准确预测融合蛋白的三维(3D)结构仍然是一项艰巨的任务,因为当前基于AI的蛋白质结构预测集中在WT蛋白上,而不是自然界中的新融合蛋白。遵循生物学的中心法则,融合蛋白是从融合转录物翻译而来的,它们是通过在癌症中通过染色体重排转录两个不同基因座之间的融合基因而产生的。准确预测融合蛋白的3D结构对于理解新嵌合蛋白的功能作用和作用机制非常重要。然而,使用基于模板的模型来预测它们的3D结构是具有挑战性的,因为已知的模板结构通常在数据库中不可用。利用多级蛋白质信息的深度学习(DL)模型彻底改变了蛋白质3D结构的预测。在这篇综述论文中,我们强调了使用DL模型预测融合蛋白3D结构的最新进展和持续挑战.我们的目标是探索采用AlphaFold2,RoseTTAFold,tr-Rosetta和D-I-TASSER用于对3D结构进行建模。摘要:这篇综述提供了融合蛋白3D结构预测的总体思路和景观。这篇综述提供了在每个步骤中使用AI方法预测融合蛋白的3D结构时应考虑的因素。这篇综述强调了使用深度学习模型预测融合蛋白3D结构的最新进展和持续挑战。这篇综述探讨了采用AlphaFold2、RoseTTAFold、tr-Rosetta,和D-I-TASSER来建模3D结构。
    Recent advancements in artificial intelligence (AI) have accelerated the prediction of unknown protein structures. However, accurately predicting the three-dimensional (3D) structures of fusion proteins remains a difficult task because the current AI-based protein structure predictions are focused on the WT proteins rather than on the newly fused proteins in nature. Following the central dogma of biology, fusion proteins are translated from fusion transcripts, which are made by transcribing the fusion genes between two different loci through the chromosomal rearrangements in cancer. Accurately predicting the 3D structures of fusion proteins is important for understanding the functional roles and mechanisms of action of new chimeric proteins. However, predicting their 3D structure using a template-based model is challenging because known template structures are often unavailable in databases. Deep learning (DL) models that utilize multi-level protein information have revolutionized the prediction of protein 3D structures. In this review paper, we highlighted the latest advancements and ongoing challenges in predicting the 3D structure of fusion proteins using DL models. We aim to explore both the advantages and challenges of employing AlphaFold2, RoseTTAFold, tr-Rosetta and D-I-TASSER for modelling the 3D structures. HIGHLIGHTS: This review provides the overall pipeline and landscape of the prediction of the 3D structure of fusion protein. This review provides the factors that should be considered in predicting the 3D structures of fusion proteins using AI approaches in each step. This review highlights the latest advancements and ongoing challenges in predicting the 3D structure of fusion proteins using deep learning models. This review explores the advantages and challenges of employing AlphaFold2, RoseTTAFold, tr-Rosetta, and D-I-TASSER to model 3D structures.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    错义耐受性比(MTR)被开发为评估变体有害性的新方法。它的三维继承者,MTR3D,在区分致病性和良性变异方面被证明是强大的。然而,它对实验结构和同源物的依赖限制了它对蛋白质组的覆盖。我们现在已经利用AlphaFold2模型来开发MTR3D-AF2,它覆盖了整个人类蛋白质组的89.31%的蛋白质和85.39%的残基。这项工作提高了MTR3D区分临床确定的致病性和良性变异的能力。MTR3D-AF2可以在https://biosig作为交互式Web服务器免费提供。实验室。uq.edu.au/mtr3daf2/.
    The missense tolerance ratio (MTR) was developed as a novel approach to assess the deleteriousness of variants. Its three-dimensional successor, MTR3D, was demonstrated powerful at discriminating pathogenic from benign variants. However, its reliance on experimental structures and homologs limited its coverage of the proteome. We have now utilized AlphaFold2 models to develop MTR3D-AF2, which covers 89.31% of proteins and 85.39% of residues across the human proteome. This work has improved MTR3D\'s ability to distinguish clinically established pathogenic from benign variants. MTR3D-AF2 is freely available as an interactive web server at https://biosig.lab.uq.edu.au/mtr3daf2/.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    二级结构预测是了解蛋白质功能和生物学特性的关键步骤,在新药开发领域具有重要意义。疾病治疗,生物工程,等。准确预测蛋白质的二级结构有助于揭示蛋白质如何折叠以及它们在细胞中的功能。深度学习模型在蛋白质结构预测中的应用尤为重要,因为它们能够处理复杂的序列信息,提取有意义的模式和特征,从而显著提高了预测的准确性和效率。在这项研究中,集成改进的时间卷积网络(TCN)的组合模型,双向长短期记忆(BiLSTM),并提出了一种多头注意(MHA)机制,以提高八态和三态结构中蛋白质预测的准确性。结合了单热编码特征和理化性质的词向量表示。一个重要的重点是利用ProtT5预训练模型的知识蒸馏技术,导致性能改进。改进的TCN,通过多尺度融合和双向操作实现,与传统的TCN模型相比,可以更好地提取氨基酸序列特征。该模型在多个数据集上表现出优异的预测性能。对于TS115、CB513和PDB(2018-2020)数据集,本文对六个数据集的八态结构的预测精度达到88.2%,84.9%,和95.3%,分别,三态结构的预测精度达到91.3%,90.3%,和96.8%,分别。本研究不仅提高了蛋白质二级结构预测的准确性,而且为了解蛋白质的结构和功能提供了重要的工具。它特别适用于资源受限的环境,并为理解蛋白质结构和功能提供了有价值的工具。
    Secondary structure prediction is a key step in understanding protein function and biological properties and is highly important in the fields of new drug development, disease treatment, bioengineering, etc. Accurately predicting the secondary structure of proteins helps to reveal how proteins are folded and how they function in cells. The application of deep learning models in protein structure prediction is particularly important because of their ability to process complex sequence information and extract meaningful patterns and features, thus significantly improving the accuracy and efficiency of prediction. In this study, a combined model integrating an improved temporal convolutional network (TCN), bidirectional long short-term memory (BiLSTM), and a multi-head attention (MHA) mechanism is proposed to enhance the accuracy of protein prediction in both eight-state and three-state structures. One-hot encoding features and word vector representations of physicochemical properties are incorporated. A significant emphasis is placed on knowledge distillation techniques utilizing the ProtT5 pretrained model, leading to performance improvements. The improved TCN, achieved through multiscale fusion and bidirectional operations, allows for better extraction of amino acid sequence features than traditional TCN models. The model demonstrated excellent prediction performance on multiple datasets. For the TS115, CB513 and PDB (2018-2020) datasets, the prediction accuracy of the eight-state structure of the six datasets in this paper reached 88.2%, 84.9%, and 95.3%, respectively, and the prediction accuracy of the three-state structure reached 91.3%, 90.3%, and 96.8%, respectively. This study not only improves the accuracy of protein secondary structure prediction but also provides an important tool for understanding protein structure and function, which is particularly applicable to resource-constrained contexts and provides a valuable tool for understanding protein structure and function.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    遗传性痉挛性截瘫(HSP)是一种罕见的神经退行性疾病,其主要特征是下肢缓慢进行性无力和痉挛。这种疾病的显著基因型和表型异质性使得其准确诊断具有挑战性。在这项研究中,我们确定NM_001168272:c.279A>G(chr3。hg19:g.4716912A>G,ITPR1基因中的N905S)变异体在一个有多个受HSP影响的三代中国家庭中,我们认为这与HSP发病机制有关。为了确认,我们进行了全外显子组测序,拷贝数变异分析,整个家庭的动态突变分析,和蛋白质结构预测。在这项研究中确定的变体在偶联域中,这是第一份将ITPR1变体分配给HSP的确证报告。这些发现扩展了HSP的临床和遗传谱,为其遗传分析和诊断提供了重要数据。
    Hereditary spastic paraplegia (HSP) is a rare neurodegenerative disease prominently characterized by slowly progressive lower limb weakness and spasticity. The significant genotypic and phenotypic heterogeneity of this disease makes its accurate diagnosis challenging. In this study, we identified the NM_001168272: c.2714A > G (chr3.hg19: g.4716912A > G, N905S) variant in the ITPR1 gene in a three-generation Chinese family with multiple individuals affected by HSP, which we believed to be associated with HSP pathogenesis. To confirm, we performed whole exome sequencing, copy number variant assays, dynamic mutation analysis of the entire family, and protein structure prediction. The variant identified in this study was in the coupling domain, and this is the first corroborated report assigning ITPR1 variants to HSP. These findings expand the clinical and genetic spectrum of HSP and provide important data for its genetic analysis and diagnosis.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号