关键词: Binding Affinity Protein Language Models Protein-Protein interactions free energy of binding

来  源:   DOI:10.1101/2024.06.21.600041   PDF(Pubmed)

Abstract:
Protein-protein interactions (PPIs) govern virtually all cellular processes. Even a single mutation within PPI can significantly influence overall protein functionality and potentially lead to various types of diseases. To date, numerous approaches have emerged for predicting the change in free energy of binding (ΔΔGbind) resulting from mutations, yet the majority of these methods lack precision. In recent years, protein language models (PLMs) have been developed and shown powerful predictive capabilities by leveraging both sequence and structural data from protein-protein complexes. Yet, PLMs have not been optimized specifically for predicting ΔΔGbind. We developed an approach to predict effects of mutations on PPI binding affinity based on two most advanced protein language models ESM2 and ESM-IF1 that incorporate PPI sequence and structural features, respectively. We used the two models to generate embeddings for each PPI mutant and subsequently fine-tuned our model by training on a large dataset of experimental ΔΔGbind values. Our model, ProBASS (Protein Binding Affinity from Structure and Sequence) achieved a correlation with experimental ΔΔGbind values of 0.83 ± 0.05 for single mutations and 0.69 ± 0.04 for double mutations when model training and testing was done on the same PDB. Moreover, ProBASS exhibited very high correlation (0.81 ± 0.02) between prediction and experiment when training and testing was performed on a dataset containing 2325 single mutations in 132 PPIs. ProBASS surpasses the state-of-the-art methods in correlation with experimental data and could be further trained as more experimental data becomes available. Our results demonstrate that the integration of extensive datasets containing ΔΔGbind values across multiple PPIs to refine the pre-trained PLMs represents a successful approach for achieving a precise and broadly applicable model for ΔΔGbind prediction, greatly facilitating future protein engineering and design studies.
摘要:
蛋白质-蛋白质相互作用(PPIs)控制着几乎所有的细胞过程。即使PPI中的单个突变也可以显着影响整体蛋白质功能,并可能导致各种类型的疾病。迄今为止,已经出现了许多方法来预测由突变引起的结合自由能(ΔΔG结合)的变化,然而,大多数这些方法缺乏精度。近年来,通过利用来自蛋白质-蛋白质复合物的序列和结构数据,蛋白质语言模型(PLMs)已经开发并显示出强大的预测能力。然而,尚未对PLM进行专门优化以预测ΔΔG结合。我们基于两个最先进的蛋白质语言模型ESM2和ESM-IF1,结合PPI序列和结构特征,开发了一种预测突变对PPI结合亲和力的影响的方法。分别。我们使用这两个模型为每个PPI突变体生成嵌入,随后通过在实验ΔΔG结合值的大数据集上训练来微调我们的模型。我们的模型,当在相同的PDB上进行模型训练和测试时,ProBASS(来自结构和序列的蛋白质结合亲和力)与单突变的实验ΔΔG结合值为0.83±0.05,双突变为0.69±0.04。此外,当在132个PPI中包含2325个单突变的数据集上进行训练和测试时,ProBASS在预测和实验之间表现出非常高的相关性(0.81±0.02)。ProBASS在与实验数据相关方面超越了最先进的方法,并且可以随着更多实验数据的可用而进一步训练。我们的结果表明,在多个PPI中整合包含ΔΔG结合值的广泛数据集以完善预先训练的PLM代表了一种成功的方法,可以实现精确且广泛适用的ΔG结合预测模型。极大地促进了未来的蛋白质工程和设计研究。
公众号