protein design

蛋白质设计
  • 文章类型: Journal Article
    蛋白质语言模型(pLM)彻底改变了蛋白质系统的计算建模,建立以结构特征为中心的数值嵌入。为了提高蛋白质嵌入中可获得的生物化学相关特性的广度,我们设计了注释词汇,一种由结构化本体定义的蛋白质属性的可读语言。我们从头开始训练AnnotationTransformers(AT),以恢复掩蔽的蛋白质属性输入,而不参考氨基酸序列,仅在蛋白质描述上构建新的数字特征空间。我们在各种模型架构中利用AT表示,用于蛋白质表示和生成。为了展示注释词汇整合的优点,我们进行了515个不同的下游实验。使用一个新的损失函数,在商业计算中只有3美元,我们的主要表示模型CAMP为15个常见数据集中的5个生成了最先进的嵌入,其余数据具有竞争力;通过注释词汇突出了潜在空间策展的计算效率。为了标准化从头生成的蛋白质序列的比较,我们提出了一种新的基于序列比对的评分,该评分比传统的语言建模指标更加灵活和生物学相关.我们的生成模型,GSM,使用类似BERT的生成方案,从仅注释提示中产生高对齐分数。特别值得注意的是,许多GSM幻觉返回统计上显著的BLAST命中,其中,即使地面真值与整个训练集具有低序列同一性,富集分析也显示与注释提示匹配的属性。总的来说,注释词汇工具箱提供了一个有希望的途径,用本体和知识图的成员代替传统的标记,在特定领域增强变压器模型。简洁,准确,注释词汇对蛋白质的有效描述提供了一种新颖的方法来构建蛋白质的数字表示以进行蛋白质注释和设计。
    Protein Language Models (pLMs) have revolutionized the computational modeling of protein systems, building numerical embeddings that are centered around structural features. To enhance the breadth of biochemically relevant properties available in protein embeddings, we engineered the Annotation Vocabulary, a transformer readable language of protein properties defined by structured ontologies. We trained Annotation Transformers (AT) from the ground up to recover masked protein property inputs without reference to amino acid sequences, building a new numerical feature space on protein descriptions alone. We leverage AT representations in various model architectures, for both protein representation and generation. To showcase the merit of Annotation Vocabulary integration, we performed 515 diverse downstream experiments. Using a novel loss function and only $3 in commercial compute, our premier representation model CAMP produces state-of-the-art embeddings for five out of 15 common datasets with competitive performance on the rest; highlighting the computational efficiency of latent space curation with Annotation Vocabulary. To standardize the comparison of de novo generated protein sequences, we suggest a new sequence alignment-based score that is more flexible and biologically relevant than traditional language modeling metrics. Our generative model, GSM, produces high alignment scores from annotation-only prompts with a BERT-like generation scheme. Of particular note, many GSM hallucinations return statistically significant BLAST hits, where enrichment analysis shows properties matching the annotation prompt - even when the ground truth has low sequence identity to the entire training set. Overall, the Annotation Vocabulary toolbox presents a promising pathway to replace traditional tokens with members of ontologies and knowledge graphs, enhancing transformer models in specific domains. The concise, accurate, and efficient descriptions of proteins by the Annotation Vocabulary offers a novel way to build numerical representations of proteins for protein annotation and design.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    细菌已经进化出了复杂的机制,可以在紧张的环境中茁壮成长。革兰氏阴性细菌中的F-样质粒编码多蛋白IV型分泌系统(T4SSF),其通过缀合过程对细菌增殖和适应具有功能。周质蛋白TrbB被认为在T4SSF组装中具有稳定的伴侣作用,TrbB表现出二硫键异构酶(DI)活性。在当前的报告中,我们证明了TrbBWT的无序N端缺失,与野生型蛋白相比,产生截短构建体TrbB37-161不影响其体外催化活性(p=0.76)。残基W37-K161,包括活性硫氧还蛋白基序,对于DI活性是足够的。TrbBWT的N端如基于ColabFold-AlphaFold2和小角度X射线散射数据和未标记蛋白质的1H-15N异核单量子相关(HSQC)光谱的GST-TrbBWT的结构模型所示,是无序的。该无序区域可能有助于蛋白质的动态性;该区域的去除导致基于1H-15NHSQC和圆二色性光谱的更稳定的蛋白质。最后,在TraW存在下TrbBWT的尺寸排阻色谱分析,预测与TrbBWT相互作用的T4SSF组装蛋白,不支持体外形成稳定复合物的推断。这项工作推进了我们对TrbB的结构和功能的理解,在T4SSF辅助蛋白的背景下,探讨了结构紊乱在蛋白质动力学中的作用,并强调了氧化还原辅助蛋白质折叠在T4SSF中的重要性。
    Bacteria have evolved elaborate mechanisms to thrive in stressful environments. F-like plasmids in gram-negative bacteria encode for a multi-protein Type IV Secretion System (T4SSF) that is functional for bacterial proliferation and adaptation through the process of conjugation. The periplasmic protein TrbB is believed to have a stabilizing chaperone role in the T4SSF assembly, with TrbB exhibiting disulfide isomerase (DI) activity. In the current report, we demonstrate that the deletion of the disordered N-terminus of TrbBWT, resulting in a truncation construct TrbB37-161, does not affect its catalytic in vitro activity compared to the wild-type protein (p = 0.76). Residues W37-K161, which include the active thioredoxin motif, are sufficient for DI activity. The N-terminus of TrbBWT is disordered as indicated by a structural model of GST-TrbBWT based on ColabFold-AlphaFold2 and Small Angle X-Ray Scattering data and 1H-15N Heteronuclear Single Quantum Correlation (HSQC) spectroscopy of the untagged protein. This disordered region likely contributes to the protein\'s dynamicity; removal of this region results in a more stable protein based on 1H-15N HSQC and Circular Dichroism Spectroscopies. Lastly, size exclusion chromatography analysis of TrbBWT in the presence of TraW, a T4SSF assembly protein predicted to interact with TrbBWT, does not support the inference of a stable complex forming in vitro. This work advances our understanding of TrbB\'s structure and function, explores the role of structural disorder in protein dynamics in the context of a T4SSF accessory protein, and highlights the importance of redox-assisted protein folding in the T4SSF.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    同源蛋白质序列的统计分析可以鉴定共同进化以产生具有不同性质的家族成员的氨基酸残基位置。基于残基位置的协同进化是维持蛋白质结构所必需的假设,统计模型揭示的共同进化特征提供了对残基-残基相互作用的深入了解,这对于在分子水平上理解蛋白质机制很重要。随着便于统计分析的基因组测序数据库的快速扩展,这种基于序列的方法已被用于研究广泛的蛋白质家族.这种方法的新兴应用是设计混合转录调节因子作为模块化遗传传感器,用于输入信号和遗传元件之间的新型布线以控制输出。在许多变构调节的调节家族中,成员包含结构保守和功能独立的蛋白质结构域,包括用于与特定遗传元件相互作用的DNA结合模块(DBM)和用于感测输入信号的配体结合模块(LBM)。通过将来自两个不同家族成员的DBM和LBM杂交,可以创建具有天然系统中不存在的信号检测和DNA识别特性的新组合的混合调节剂。在这次审查中,我们介绍了混合调节器的最新进展及其在细胞工程中的应用,特别是侧重于使用统计分析来表征DBM-LBM相互作用和混合调节器设计。基于这些研究,然后,我们讨论了当前的局限性和潜在的方向,以提高这种基于序列的设计方法的影响。
    Statistical analyses of homologous protein sequences can identify amino acid residue positions that co-evolve to generate family members with different properties. Based on the hypothesis that the coevolution of residue positions is necessary for maintaining protein structure, coevolutionary traits revealed by statistical models provide insight into residue-residue interactions that are important for understanding protein mechanisms at the molecular level. With the rapid expansion of genome sequencing databases that facilitate statistical analyses, this sequence-based approach has been used to study a broad range of protein families. An emerging application of this approach is to design hybrid transcriptional regulators as modular genetic sensors for novel wiring between input signals and genetic elements to control outputs. Among many allosterically regulated regulator families, the members contain structurally conserved and functionally independent protein domains, including a DNA-binding module (DBM) for interacting with a specific genetic element and a ligand-binding module (LBM) for sensing an input signal. By hybridizing a DBM and an LBM from two different family members, a hybrid regulator can be created with a new combination of signal-detection and DNA-recognition properties not present in natural systems. In this review, we present recent advances in the development of hybrid regulators and their applications in cellular engineering, especially focusing on the use of statistical analyses for characterizing DBM-LBM interactions and hybrid regulator design. Based on these studies, we then discuss the current limitations and potential directions for enhancing the impact of this sequence-based design approach.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    SARS-CoV-2变体的动态演变需要治疗策略的不断进步。尽管有像bebtelovimab这样的单克隆抗体(mAb)疗法的前景,关于抗性突变的担忧仍然存在,特别是受体结合域(RBD)中的单-多点突变。我们的研究通过采用界面指导的计算蛋白质设计来预测潜在的bebtelovimab抗性突变来解决这一问题。通过广泛的物理化学分析,突变偏好,精确召回指标,蛋白质-蛋白质对接,和能量分析,结合全原子,和粗粒度分子动力学(MD)模拟,我们阐明了bebtelovimab-RBD复合物的结构动力学结合特征。在正选择压力下鉴定敏感的RBD残基,再加上对bebtelovimab逃逸突变的验证,临床报道的耐药突变,和病毒基因组序列增强了我们发现的翻译意义,并有助于更好地理解SARS-CoV-2的抗性机制。
    The dynamic evolution of SARS-CoV-2 variants necessitates ongoing advancements in therapeutic strategies. Despite the promise of monoclonal antibody (mAb) therapies like bebtelovimab, concerns persist regarding resistance mutations, particularly single-to-multipoint mutations in the receptor-binding domain (RBD). Our study addresses this by employing interface-guided computational protein design to predict potential bebtelovimab-resistance mutations. Through extensive physicochemical analysis, mutational preferences, precision-recall metrics, protein-protein docking, and energetic analyses, combined with all-atom, and coarse-grained molecular dynamics (MD) simulations, we elucidated the structural-dynamics-binding features of the bebtelovimab-RBD complexes. Identification of susceptible RBD residues under positive selection pressure, coupled with validation against bebtelovimab-escape mutations, clinically reported resistance mutations, and viral genomic sequences enhances the translational significance of our findings and contributes to a better understanding of the resistance mechanisms of SARS-CoV-2.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    基于进化的深度生成模型代表了理解和设计蛋白质的令人兴奋的方向。一个悬而未决的问题是,此类模型是否可以学习专门的功能约束,以控制特定生物学环境中的适应性。这里,我们研究了生成模型产生Src-同源性3(SH3)结构域的合成版本的能力,这些结构域介导酵母Sho1渗透应激反应途径中的信号传导。我们证明了变分自动编码器(VAE)模型产生的人工序列可以通过实验概括自然SH3域的功能。更一般地说,该模型组织了所有真菌SH3结构域,使得模型潜在空间中的局部性(而不仅仅是序列空间中的局部性)丰富了合成直向同源物的设计,并暴露了分布在SH3配体结合位点附近和远处的非明显氨基酸约束。生成模型在体内设计直系同源样功能的能力为在特定细胞背景和环境中工程化蛋白质功能开辟了新途径。
    Evolution-based deep generative models represent an exciting direction in understanding and designing proteins. An open question is whether such models can learn specialized functional constraints that control fitness in specific biological contexts. Here, we examine the ability of generative models to produce synthetic versions of Src-homology 3 (SH3) domains that mediate signaling in the Sho1 osmotic stress response pathway of yeast. We show that a variational autoencoder (VAE) model produces artificial sequences that experimentally recapitulate the function of natural SH3 domains. More generally, the model organizes all fungal SH3 domains such that locality in the model latent space (but not simply locality in sequence space) enriches the design of synthetic orthologs and exposes non-obvious amino acid constraints distributed near and far from the SH3 ligand-binding site. The ability of generative models to design ortholog-like functions in vivo opens new avenues for engineering protein function in specific cellular contexts and environments.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    FrancisCrick的卷曲螺旋几何结构的全局参数化对指导新蛋白质结构和功能的设计具有广泛的有用性。然而,由类似的β桶结构全局参数化指导的设计不太成功,可能是由于与理想的β桶几何形状要求的偏差,以保持广泛的股间氢键键合而不引入相当大的主链应变。相反,β桶和其他蛋白质折叠已经被设计的二维结构蓝图的指导;虽然这种方法已经成功地产生了新的荧光蛋白,跨膜纳米孔,和其他结构,它需要相当的专业知识,只提供对全球桶形的间接控制。在这里,我们表明,通过利用基于RoseTTAFold的修补和扩散设计方法中隐含的丰富的序列-结构关系,可以超越卷曲线圈,对全局参数表示提供的形状和结构的简单性和控制进行推广。从参数化生成的理想化桶骨架开始,RFjoint油漆和RFdiffusion都很容易结合正确折叠所需的骨架不规则性,而与理想的桶形几何形状的偏差最小。我们表明,对于广泛的全球β表参数化的β桶,这些方法获得了很高的计算机模拟和实验成功率,新的β桶拓扑结构的X射线晶体结构证实了原子精度,并且从头设计了12、14和16个链的跨膜纳米孔,其电导率范围为200至500pS。通过将参数生成的简单性和控制性与基于深度学习的蛋白质设计方法的高成功率相结合,我们的方法设计了整体形状赋予功能的蛋白质,如β桶纳米孔,更精确地可指定和可访问。
    Francis Crick\'s global parameterization of coiled coil geometry has been widely useful for guiding design of new protein structures and functions. However, design guided by similar global parameterization of beta barrel structures has been less successful, likely due to the deviations required from ideal beta barrel geometry to maintain extensive inter-strand hydrogen bonding without introducing considerable backbone strain. Instead, beta barrels and other protein folds have been designed guided by 2D structural blueprints; while this approach has successfully generated new fluorescent proteins, transmembrane nanopores, and other structures, it requires considerable expert knowledge and provides only indirect control over the global barrel shape. Here we show that the simplicity and control over shape and structure provided by global parametric representations can be generalized beyond coiled coils by taking advantage of the rich sequence-structure relationships implicit in RoseTTAFold based inpainting and diffusion design methods. Starting from parametrically generated idealized barrel backbones, both RFjoint inpainting and RFdiffusion readily incorporate the backbone irregularities necessary for proper folding with minimal deviation from the idealized barrel geometries. We show that for beta barrels across a broad range of global beta sheet parameterizations, these methods achieve high in silico and experimental success rates, with atomic accuracy confirmed by an X-ray crystal structure of a novel beta barrel topology, and de novo designed 12, 14, and 16 stranded transmembrane nanopores with conductances ranging from 200 to 500 pS. By combining the simplicity and control of parametric generation with the high success rates of deep learning based protein design methods, our approach makes the design of proteins where global shape confers function, such as beta barrel nanopores, more precisely specifiable and accessible.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    以高亲和力和特异性结合固有无序蛋白质(IDP)和固有无序区域(IDR)的蛋白质对于治疗和诊断应用可能具有相当大的实用性。然而,针对国内流离失所者/国内流离失所者的一般方法尚未制定。这里,我们表明,仅从输入的目标序列开始,并自由取样目标和结合蛋白构象,RF扩散可以产生多种构象的IDP和IDR的结合剂。我们使用这种方法来产生IDPs淀粉样蛋白的结合剂,C-肽和VP48在一定范围的构象中,其中Kds在3-100nM范围内。胰淀素粘合剂抑制淀粉样蛋白原纤维形成并解离现有纤维,并能够富集胰淀素用于基于质谱的检测。对于IDRsG3bp1、共同γ链(IL2RG)和朊病毒,我们将结合物扩散到目标的β链构象上,获得10至100nM亲和力。IL2RG结合剂与细胞中的受体共定位,启用调节IL2信号传导的新方法。我们的方法应该广泛用于创建跨越宽范围内在构象偏好的柔性IDP/IDR的结合物。
    Proteins which bind intrinsically disordered proteins (IDPs) and intrinsically disordered regions (IDRs) with high affinity and specificity could have considerable utility for therapeutic and diagnostic applications. However, a general methodology for targeting IDPs/IDRs has yet to be developed. Here, we show that starting only from the target sequence of the input, and freely sampling both target and binding protein conformation, RFdiffusion can generate binders to IDPs and IDRs in a wide range of conformations. We use this approach to generate binders to the IDPs Amylin, C-peptide and VP48 in a range of conformations with Kds in the 3 -100nM range. The Amylin binder inhibits amyloid fibril formation and dissociates existing fibers, and enables enrichment of amylin for mass spectrometry-based detection. For the IDRs G3bp1, common gamma chain (IL2RG) and prion, we diffused binders to beta strand conformations of the targets, obtaining 10 to 100 nM affinity. The IL2RG binder colocalizes with the receptor in cells, enabling new approaches to modulating IL2 signaling. Our approach should be widely useful for creating binders to flexible IDPs/IDRs spanning a wide range of intrinsic conformational preferences.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    青霉素作为抗生素的最初采用标志着探索药物必需的其他化合物的开始,然而,对青霉素的抗性及其副作用已经损害了它们的功效。N末端亲核试剂(Ntn)酰胺水解酶S45家族在催化各种化合物的酰胺键水解中起关键作用,包括抗生素如青霉素和头孢菌素.本研究全面分析了细菌N-末端亲核试剂(Ntn)酰胺水解酶S45家族的结构和功能性状,涵盖青霉素G酰基酶,头孢菌素酰基转移酶,和D-琥珀酰基转移酶.利用结构生物信息学工具和序列分析,该研究描述了这些酶之间的结构保守区域(SCR)和底物结合位点变异。值得注意的是,16个对底物相互作用至关重要的SCR仅通过序列分析鉴定,强调序列数据在表征功能相关区域中的重要性。这些发现为识别靶标以增强N末端亲核试剂(Ntn)酰胺水解酶的生物催化特性引入了一种新方法,在促进开发更精确的三维模型的同时,特别是对于缺乏结构数据的酶。总的来说,这项研究促进了我们对细菌N末端亲核(Ntn)酰胺水解酶中结构-功能关系的理解,提供对优化其酶能力的策略的见解。
    The initial adoption of penicillin as an antibiotic marked the start of exploring other compounds essential for pharmaceuticals, yet resistance to penicillins and their side effects has compromised their efficacy. The N-terminal nucleophile (Ntn) amide-hydrolases S45 family plays a key role in catalyzing amide bond hydrolysis in various compounds, including antibiotics like penicillin and cephalosporin. This study comprehensively analyzes the structural and functional traits of the bacterial N-terminal nucleophile (Ntn) amide-hydrolases S45 family, covering penicillin G acylases, cephalosporin acylases, and D-succinylase. Utilizing structural bioinformatics tools and sequence analysis, the investigation delineates structurally conserved regions (SCRs) and substrate binding site variations among these enzymes. Notably, sixteen SCRs crucial for substrate interaction are identified solely through sequence analysis, emphasizing the significance of sequence data in characterizing functionally relevant regions. These findings introduce a novel approach for identifying targets to enhance the biocatalytic properties of N-terminal nucleophile (Ntn) amide-hydrolases, while facilitating the development of more accurate three-dimensional models, particularly for enzymes lacking structural data. Overall, this research advances our understanding of structure-function relationships in bacterial N-terminal nucleophile (Ntn) amide-hydrolases, providing insights into strategies for optimizing their enzymatic capabilities.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    了解蛋白质在选择压力下如何进化是一个长期的挑战。搜索空间的巨大限制了系统地评估多个同时突变的影响,所以突变通常是单独评估的。然而,上位性,或者突变相互作用的方式,基于对单个突变的测量,阻止了对组合突变的准确预测。这里,我们使用人工智能来定义蛋白质结合位点的整个功能序列景观,我们称这种方法为完全组合突变计数(CCME)。通过利用CCME,我们能够在这个功能序列景观中构建一个完整的进化连接图。作为概念的证明,我们将CCME应用于SARS-CoV-2刺突蛋白受体结合域的ACE2结合位点。我们从整个功能序列景观中选择了代表性的变体用于实验室测试。我们确定了尽管改变了超过40%的评估残基位置,但仍保留了结合ACE2的功能的变体,和变体现在逃避结合和单克隆抗体的中和。这项工作代表了朝着实现病原体进化的精确预测迈出的关键第一步,开辟主动缓解的途径。
    Understanding how proteins evolve under selective pressure is a longstanding challenge. The immensity of the search space has limited efforts to systematically evaluate the impact of multiple simultaneous mutations, so mutations have typically been assessed individually. However, epistasis, or the way in which mutations interact, prevents accurate prediction of combinatorial mutations based on measurements of individual mutations. Here, we use artificial intelligence to define the entire functional sequence landscape of a protein binding site in silico, and we call this approach Complete Combinatorial Mutational Enumeration (CCME). By leveraging CCME, we are able to construct a comprehensive map of the evolutionary connectivity within this functional sequence landscape. As a proof of concept, we applied CCME to the ACE2 binding site of the SARS-CoV-2 spike protein receptor binding domain. We selected representative variants from across the functional sequence landscape for testing in the laboratory. We identified variants that retained functionality to bind ACE2 despite changing over 40% of evaluated residue positions, and the variants now escape binding and neutralization by monoclonal antibodies. This work represents a crucial initial stride toward achieving precise predictions of pathogen evolution, opening avenues for proactive mitigation.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    了解蛋白质中的序列-结构关系是最重要的,但是具有实际应用,例如肽和蛋白质的合理设计。在这项研究中,对含I型左手β-螺旋蛋白的这种关系进行了更新和重新审视。分析蛋白质数据库中可用的实验结构,我们可以描述,更详细地说,对该褶皱的稳定性很重要的结构特征,以及它的成核和终止。这项研究旨在完成以前的工作,因为它提供了螺旋的N端和C端梯级的单独分析。描述了这些梯级的特定序列基序以及它们形成的结构元素。
    Understanding the sequence-structure relationship in protein is of fundamental interest, but has practical applications such as the rational design of peptides and proteins. This relationship in the Type I left-handed β-helix containing proteins is updated and revisited in this study. Analyzing the available experimental structures in the Protein Data Bank, we could describe, further in detail, the structural features that are important for the stability of this fold, as well as its nucleation and termination. This study is meant to complete previous work, as it provides a separate analysis of the N-terminal and C-terminal rungs of the helix. Particular sequence motifs of these rungs are described along with the structural element they form.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

公众号