de novo proteins

从头蛋白
  • 文章类型: Journal Article
    在从头出现期间,新的蛋白质编码基因从以前的非基因序列中出现。它们编码的从头蛋白在组成和预测的生化特性上与保守蛋白不同。然而,功能性从头蛋白质确实存在。功能性从头蛋白的鉴定及其结构表征在实验上都是费力的。为了在计算机中鉴定功能性和结构化的从头蛋白,我们应用了最近开发的基于机器学习的工具,发现大多数从头蛋白在结构和序列上确实与保守蛋白不同。然而,一些从头蛋白质被预测采用已知的蛋白质折叠,参与细胞反应,并形成生物分子缩合物。除了扩大我们对从头蛋白质进化的理解,我们的研究还提供了大量可检验的假设,用于对果蝇中从头蛋白的结构和功能进行重点实验研究。
    During de novo emergence, new protein coding genes emerge from previously nongenic sequences. The de novo proteins they encode are dissimilar in composition and predicted biochemical properties to conserved proteins. However, functional de novo proteins indeed exist. Both identification of functional de novo proteins and their structural characterization are experimentally laborious. To identify functional and structured de novo proteins in silico, we applied recently developed machine learning based tools and found that most de novo proteins are indeed different from conserved proteins both in their structure and sequence. However, some de novo proteins are predicted to adopt known protein folds, participate in cellular reactions, and to form biomolecular condensates. Apart from broadening our understanding of de novo protein evolution, our study also provides a large set of testable hypotheses for focused experimental studies on structure and function of de novo proteins in Drosophila.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    蛋白质合成方法已经适应于掺入不断增长水平的非天然组分。同时,从头设计蛋白质结构和功能已经迅速成为一种可行的能力。然而,这两个令人兴奋的趋势尚未以有意义的方式相交。与非蛋白成分进行从头设计的能力要求合成和计算在共同的目标和应用上对齐。这种观点考察了这些领域的最新技术,并确定了具体的,相应的应用,以推进该领域向广义大分子设计。
    Protein synthesis methods have been adapted to incorporate an ever-growing level of non-natural components. Meanwhile, design of de novo protein structure and function has rapidly emerged as a viable capability. Yet, these two exciting trends have yet to intersect in a meaningful way. The ability to perform de novo design with non-proteinogenic components requires that synthesis and computation align on common targets and applications. This perspective examines the state of the art in these areas and identifies specific, consequential applications to advance the field toward generalized macromolecule design.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    计算蛋白质序列设计的宏伟目标是修改现有的或创造新的蛋白质;然而,在没有蛋白质动力学和变形反应的可预测性的情况下,设计稳定和功能性的蛋白质是具有挑战性的。用进化信息告知蛋白质设计方法将突变空间限制为更像天然的序列,并导致稳定性增加,同时保持功能。最近,语言模型,在数百万个蛋白质序列上训练,在预测突变的影响方面表现出令人印象深刻的性能。用语言模型评估罗塞塔设计的序列显示,得分比原始序列差。要通过语言模型预测通知Rosetta设计方案,我们增加了一个新的指标来抑制能量函数在设计过程中使用进化尺度建模(ESM)模型。得到的序列具有更好的语言模型分数和相似的序列恢复,根据Rosetta能量评估,体能仅略有下降。总之,我们的工作将最近的机器学习方法与Rosetta蛋白质设计工具箱相结合。
    Computational protein sequence design has the ambitious goal of modifying existing or creating new proteins; however, designing stable and functional proteins is challenging without predictability of protein dynamics and allostery. Informing protein design methods with evolutionary information limits the mutational space to more native-like sequences and results in increased stability while maintaining functions. Recently, language models, trained on millions of protein sequences, have shown impressive performance in predicting the effects of mutations. Assessing Rosetta-designed sequences with a language model showed scores that were worse than those of their original sequence. To inform Rosetta design protocols with language model predictions, we added a new metric to restrain the energy function during design using the Evolutionary Scale Modeling (ESM) model. The resulting sequences have better language model scores and similar sequence recovery, with only a minor decrease in the fitness as assessed by Rosetta energy. In conclusion, our work combines the strength of recent machine learning approaches with the Rosetta protein design toolbox.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    多特异性抗体识别位于相同或不同靶标上的两个或更多个表位。通过蛋白质设计的这种增加的能力允许这些人造分子解决未满足的医疗需求,而这些需求对于单一靶向(例如单独的单克隆抗体或细胞因子)不再是可能的。然而,开发这些多特异性分子的方法遇到了许多道路颠簸,这表明需要多特异性分子的新工作流程。对介导构建块成功组装成非天然四级结构的分子基础的研究将导致撰写用于多特异性的剧本。如果我们要设计我们可以控制并反过来预测成功的工作流程,这是必须做的。这里,我们反思目前最先进的治疗生物制剂,看看构建的基础,在蛋白质方面,以及可用于构建这种下一代工作流程基础的工具。
    Multispecific antibodies recognize two or more epitopes located on the same or distinct targets. This added capability through protein design allows these man-made molecules to address unmet medical needs that are no longer possible with single targeting such as with monoclonal antibodies or cytokines alone. However, the approach to the development of these multispecific molecules has been met with numerous road bumps, which suggests that a new workflow for multispecific molecules is required. The investigation of the molecular basis that mediates the successful assembly of the building blocks into non-native quaternary structures will lead to the writing of a playbook for multispecifics. This is a must do if we are to design workflows that we can control and in turn predict success. Here, we reflect on the current state-of-the-art of therapeutic biologics and look at the building blocks, in terms of proteins, and tools that can be used to build the foundations of such a next-generation workflow.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    了解从头和随机蛋白质的出现和结构特征对于解开蛋白质进化和设计新型酶至关重要。然而,他们的结构的实验测定仍然具有挑战性。蛋白质结构预测的最新进展,特别是对于AlphaFold2(AF2),扩大了我们对蛋白质结构的认识,但它们对从头和随机蛋白质的适用性尚不清楚。在这项研究中,我们调查了AF2和基于蛋白质语言模型的预测因子ESFFold对果蝇和可比较随机蛋白质数据集的从头和保守蛋白质的结构预测和置信度评分.我们发现从头和随机蛋白质的结构预测与保守蛋白质显着不同。有趣的是,对于从头和随机蛋白,观察到障碍和信心评分(pLDDT)之间的正相关,与保守蛋白的负相关相反。此外,缺乏序列同一性,阻碍了从头和随机蛋白质的结构预测因子的性能。我们还观察到随机蛋白质的不同序列长度四分位数之间波动的中位数预测障碍,表明序列长度对疾病预测的影响。总之,虽然结构预测因子提供了从头和随机蛋白质结构组成的初步见解,它们对这些蛋白质的准确性和适用性仍然有限。为了全面了解它们的结构,必须进行实验确定。疾病与pLDDT之间的正相关可能暗示从头和随机蛋白质的条件折叠和瞬时结合相互作用的潜力。
    Understanding the emergence and structural characteristics of de novo and random proteins is crucial for unraveling protein evolution and designing novel enzymes. However, experimental determination of their structures remains challenging. Recent advancements in protein structure prediction, particularly with AlphaFold2 (AF2), have expanded our knowledge of protein structures, but their applicability to de novo and random proteins is unclear. In this study, we investigate the structural predictions and confidence scores of AF2 and protein language model-based predictor ESMFold for de novo and conserved proteins from Drosophila and a dataset of comparable random proteins. We find that the structural predictions for de novo and random proteins differ significantly from conserved proteins. Interestingly, a positive correlation between disorder and confidence scores (pLDDT) is observed for de novo and random proteins, in contrast to the negative correlation observed for conserved proteins. Furthermore, the performance of structure predictors for de novo and random proteins is hampered by the lack of sequence identity. We also observe fluctuating median predicted disorder among different sequence length quartiles for random proteins, suggesting an influence of sequence length on disorder predictions. In conclusion, while structure predictors provide initial insights into the structural composition of de novo and random proteins, their accuracy and applicability to such proteins remain limited. Experimental determination of their structures is necessary for a comprehensive understanding. The positive correlation between disorder and pLDDT could imply a potential for conditional folding and transient binding interactions of de novo and random proteins.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • DOI:
    文章类型: Preprint
    通过进化,大自然展示了一套非凡的蛋白质材料,包括弹性蛋白,丝绸,角蛋白和胶原蛋白具有优异的机械性能,在机械生物学中起关键作用。然而,超越自然设计,以发现满足指定的机械性能的蛋白质仍然具有挑战性。在这里,我们报告了一个生成模型,该模型预测蛋白质设计以满足复杂的非线性机械性能设计目标。我们的模型利用来自预先训练的蛋白质语言模型的蛋白质序列的深入知识,并映射机械展开响应以创建新的蛋白质。通过全原子分子模拟进行直接验证,我们证明设计的蛋白质是新颖的,并达到目标机械性能,包括展开能量和机械强度,以及详细的展开力-分离曲线。我们的模型提供了快速途径来探索不受生物合成约束的巨大机械生物学蛋白质序列空间,以力学特征为目标,使蛋白质材料具有优越的机械性能的发现。
    Through evolution, nature has presented a set of remarkable protein materials, including elastins, silks, keratins and collagens with superior mechanical performances that play crucial roles in mechanobiology. However, going beyond natural designs to discover proteins that meet specified mechanical properties remains challenging. Here we report a generative model that predicts protein designs to meet complex nonlinear mechanical property-design objectives. Our model leverages deep knowledge on protein sequences from a pre-trained protein language model and maps mechanical unfolding responses to create novel proteins. Via full-atom molecular simulations for direct validation, we demonstrate that the designed proteins are novel, and fulfill the targeted mechanical properties, including unfolding energy and mechanical strength, as well as the detailed unfolding force-separation curves. Our model offers rapid pathways to explore the enormous mechanobiological protein sequence space unconstrained by biological synthesis, using mechanical features as target to enable the discovery of protein materials with superior mechanical properties.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    盘卷是一种广泛且广为人知的蛋白质折叠。它们的简短而简单的重复序列支撑了相当大的结构和功能多样性。绝大多数卷曲螺旋由7个残基(七肽)序列重复组成,但本质上,大多数3-和4-残基段的组合,每个从疏水核的残基开始,与卷曲螺旋结构相容。在这些其他重复模式中,最常见的是11-残基(hendecad,3+4+4)重复。Hendecads经常以低拷贝数发现,散布在七肽之间,但是一些蛋白质大部分或完全由hendecad重复组成。在这里,我们描述了生命蛋白质组中这些蛋白质的首次大规模调查。为此,我们扫描了蛋白质序列数据库中缺乏β链预测的具有11个残基周期的序列.然后,我们通过成对相似性对它们进行聚类,以构建潜在的hendecad卷曲螺旋家族图。在这里,我们根据它们的结构特性来讨论这些,它们潜在的细胞作用,以及塑造其多样性的进化机制。我们特别注意到hendecads的连续放大,既在现有的蛋白质内,也从以前的非编码序列从头,作为新卷曲螺旋形成的强大机制。
    Coiled coils are a widespread and well understood protein fold. Their short and simple repeats underpin considerable structural and functional diversity. The vast majority of coiled coils consist of 7-residue (heptad) sequence repeats, but in essence most combinations of 3- and 4-residue segments, each starting with a residue of the hydrophobic core, are compatible with coiled-coil structure. The most frequent among these other repeat patterns are 11-residue (hendecad, 3 + 4 + 4) repeats. Hendecads are frequently found in low copy number, interspersed between heptads, but some proteins consist largely or entirely of hendecad repeats. Here we describe the first large-scale survey of these proteins in the proteome of life. For this, we scanned the protein sequence database for sequences with 11-residue periodicity that lacked β-strand prediction. We then clustered these by pairwise similarity to construct a map of potential hendecad coiled-coil families. Here we discuss these according to their structural properties, their potential cellular roles, and the evolutionary mechanisms shaping their diversity. We note in particular the continuous amplification of hendecads, both within existing proteins and de novo from previously non-coding sequence, as a powerful mechanism in the genesis of new coiled-coil forms.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    我们提出了从头设计的三聚体肽中乙酰辅酶A合酶(ACS)的Nip位点模型,该肽自组装以产生同质Ni(Cys)3结合基序。配体结合的光谱和动力学研究表明Ni结合使肽组装稳定并产生末端NiI-CO络合物。当CO结合态与甲基供体反应时,一个新的物种很快就产生了新的光谱特征。虽然与金属结合的CO是未活化的,甲基供体的存在产生活化的金属-CO络合物。选择性外球空间修饰表明,配体结合态的物理性质会根据Ni位点上方或下方的空间修饰位置而不同地改变。
    We present a Nip site model of acetyl coenzyme-A synthase (ACS) within a de novo-designed trimer peptide that self-assembles to produce a homoleptic Ni(Cys)3 binding motif. Spectroscopic and kinetic studies of ligand binding demonstrate that Ni binding stabilizes the peptide assembly and produces a terminal NiI-CO complex. When the CO-bound state is reacted with a methyl donor, a new species is quickly produced with new spectral features. While the metal-bound CO is albeit unactivated, the presence of the methyl donor produces an activated metal-CO complex. Selective outer sphere steric modifications demonstrate that the physical properties of the ligand-bound states are altered differently depending on the location of the steric modification above or below the Ni site.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:从头蛋白编码基因在基因组的非编码区从无到有,根据定义,与其他基因无同源性。因此,它们编码的从头蛋白质属于所谓的“暗蛋白质空间”。到目前为止,只有四种从头蛋白质结构被实验近似。同源性低,在大多数情况下,假定的高度无序和有限的结构会导致从头蛋白的低置信度结构预测。这里,我们研究最广泛使用的结构和疾病预测因子,并评估它们对从头出现的蛋白质的适用性.由于AlphaFold2是基于多个序列比对的生成,并且在很大程度上保守和球状蛋白质的解析结构上进行了训练,其在从头蛋白上的性能仍然未知。最近,蛋白质的自然语言模型已用于无对齐结构预测,可能使它们比AlphaFold2更适合从头蛋白。方法:我们应用不同的疾病预测因子(IUPred3短/长,flDPnn)和结构预测因子,一方面是AlphaFold2和基于语言的模型(Omegafold,ESMfold,RGN2)另一方面,四种具有结构实验证据的从头蛋白质。我们比较了不同预测因子之间的结果预测以及现有的实验证据。结果:IUPred的结果,最广泛使用的疾病预测因子,在很大程度上取决于参数的选择,并且与最近在比较评估研究中发现优于大多数其他预测因子的flDPnn存在显着差异。同样,不同的结构预测因子对从头蛋白产生不同的结果和置信度评分.结论:我们建议,虽然在某些情况下,基于蛋白质语言模型的方法可能比AlphaFold2更准确,但从头出现蛋白质的结构预测对于任何预测因子来说仍然是一项艰巨的任务。无论是无序还是结构。
    Background: De novo protein coding genes emerge from scratch in the non-coding regions of the genome and have, per definition, no homology to other genes. Therefore, their encoded de novo proteins belong to the so-called \"dark protein space\". So far, only four de novo protein structures have been experimentally approximated. Low homology, presumed high disorder and limited structures result in low confidence structural predictions for de novo proteins in most cases. Here, we look at the most widely used structure and disorder predictors and assess their applicability for de novo emerged proteins. Since AlphaFold2 is based on the generation of multiple sequence alignments and was trained on solved structures of largely conserved and globular proteins, its performance on de novo proteins remains unknown. More recently, natural language models of proteins have been used for alignment-free structure predictions, potentially making them more suitable for de novo proteins than AlphaFold2. Methods: We applied different disorder predictors (IUPred3 short/long, flDPnn) and structure predictors, AlphaFold2 on the one hand and language-based models (Omegafold, ESMfold, RGN2) on the other hand, to four de novo proteins with experimental evidence on structure. We compared the resulting predictions between the different predictors as well as to the existing experimental evidence. Results: Results from IUPred, the most widely used disorder predictor, depend heavily on the choice of parameters and differ significantly from flDPnn which has been found to outperform most other predictors in a comparative assessment study recently. Similarly, different structure predictors yielded varying results and confidence scores for de novo proteins. Conclusions: We suggest that, while in some cases protein language model based approaches might be more accurate than AlphaFold2, the structure prediction of de novo emerged proteins remains a difficult task for any predictor, be it disorder or structure.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    从头金属蛋白设计涉及由极性和非极性残基的特定重复模式指导的蛋白质的构建。which,在自我组装时,提供合适的环境来结合金属和生产人工金属酶。虽然在从头设计的金属蛋白中已经实现了广泛的功能,目前,此类构建体对替代能源相关催化的功能库有限。在这里,我们展示了从头方法在设计功能性H2进化蛋白中的应用。该设计涉及在每个螺旋的串联a/d位点具有半胱氨酸的两亲性肽的组装。有趣的是,在NiII添加后,低聚物从主要的三聚体组装转变为二聚体和三聚体的混合物。金属蛋白光催化产生H2,具有钟形pH依赖性,在pH5.5时具有最大活性。瞬态吸收光谱法用于确定电子转移的时间尺度作为pH的函数。进行选择性的外球突变以探测局部环境如何调节活动。通过NiII位点上方的空间调制观察到活性的优先增强,朝向N-终端,与低于NiII位点的C末端相比。
    De novo metalloprotein design involves the construction of proteins guided by specific repeat patterns of polar and apolar residues, which, upon self-assembly, provide a suitable environment to bind metals and produce artificial metalloenzymes. While a wide range of functionalities have been realized in de novo designed metalloproteins, the functional repertoire of such constructs towards alternative energy-relevant catalysis is currently limited. Here we show the application of de novo approach to design a functional H2 evolving protein. The design involved the assembly of an amphiphilic peptide featuring cysteines at tandem a/d sites of each helix. Intriguingly, upon NiII addition, the oligomers shift from a major trimeric assembly to a mix of dimers and trimers. The metalloprotein produced H2 photocatalytically with a bell-shape pH dependence, having a maximum activity at pH 5.5. Transient absorption spectroscopy is used to determine the timescales of electron transfer as a function of pH. Selective outer sphere mutations are made to probe how the local environment tunes activity. A preferential enhancement of activity is observed via steric modulation above the NiII site, towards the N-termini, compared to below the NiII site towards the C-termini.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号