关键词: Rosetta computational protein design de novo proteins evolutionary fitness protein language model protein language models thermodynamic stability

Mesh : Proteins / genetics Amino Acid Sequence

来  源:   DOI:10.1021/acssynbio.3c00753   PDF(Pubmed)

Abstract:
Computational protein sequence design has the ambitious goal of modifying existing or creating new proteins; however, designing stable and functional proteins is challenging without predictability of protein dynamics and allostery. Informing protein design methods with evolutionary information limits the mutational space to more native-like sequences and results in increased stability while maintaining functions. Recently, language models, trained on millions of protein sequences, have shown impressive performance in predicting the effects of mutations. Assessing Rosetta-designed sequences with a language model showed scores that were worse than those of their original sequence. To inform Rosetta design protocols with language model predictions, we added a new metric to restrain the energy function during design using the Evolutionary Scale Modeling (ESM) model. The resulting sequences have better language model scores and similar sequence recovery, with only a minor decrease in the fitness as assessed by Rosetta energy. In conclusion, our work combines the strength of recent machine learning approaches with the Rosetta protein design toolbox.
摘要:
计算蛋白质序列设计的宏伟目标是修改现有的或创造新的蛋白质;然而,在没有蛋白质动力学和变形反应的可预测性的情况下,设计稳定和功能性的蛋白质是具有挑战性的。用进化信息告知蛋白质设计方法将突变空间限制为更像天然的序列,并导致稳定性增加,同时保持功能。最近,语言模型,在数百万个蛋白质序列上训练,在预测突变的影响方面表现出令人印象深刻的性能。用语言模型评估罗塞塔设计的序列显示,得分比原始序列差。要通过语言模型预测通知Rosetta设计方案,我们增加了一个新的指标来抑制能量函数在设计过程中使用进化尺度建模(ESM)模型。得到的序列具有更好的语言模型分数和相似的序列恢复,根据Rosetta能量评估,体能仅略有下降。总之,我们的工作将最近的机器学习方法与Rosetta蛋白质设计工具箱相结合。
公众号