关键词: antibody binding properties language model structure prediction

Mesh : Antibodies / chemistry immunology Computational Biology / methods Protein Conformation Humans Models, Molecular Deep Learning

来  源:   DOI:10.1093/bib/bbae245   PDF(Pubmed)

Abstract:
In recent decades, antibodies have emerged as indispensable therapeutics for combating diseases, particularly viral infections. However, their development has been hindered by limited structural information and labor-intensive engineering processes. Fortunately, significant advancements in deep learning methods have facilitated the precise prediction of protein structure and function by leveraging co-evolution information from homologous proteins. Despite these advances, predicting the conformation of antibodies remains challenging due to their unique evolution and the high flexibility of their antigen-binding regions. Here, to address this challenge, we present the Bio-inspired Antibody Language Model (BALM). This model is trained on a vast dataset comprising 336 million 40% nonredundant unlabeled antibody sequences, capturing both unique and conserved properties specific to antibodies. Notably, BALM showcases exceptional performance across four antigen-binding prediction tasks. Moreover, we introduce BALMFold, an end-to-end method derived from BALM, capable of swiftly predicting full atomic antibody structures from individual sequences. Remarkably, BALMFold outperforms those well-established methods like AlphaFold2, IgFold, ESMFold and OmegaFold in the antibody benchmark, demonstrating significant potential to advance innovative engineering and streamline therapeutic antibody development by reducing the need for unnecessary trials. The BALMFold structure prediction server is freely available at https://beamlab-sh.com/models/BALMFold.
摘要:
近几十年来,抗体已经成为对抗疾病不可或缺的疗法,尤其是病毒感染。然而,有限的结构信息和劳动密集型的工程过程阻碍了它们的发展。幸运的是,深度学习方法的重大进步通过利用同源蛋白质的共同进化信息,促进了蛋白质结构和功能的精确预测。尽管取得了这些进展,由于其独特的进化和抗原结合区的高度灵活性,预测抗体的构象仍然具有挑战性.这里,为了应对这一挑战,我们提出了生物启发的抗体语言模型(BALM)。该模型是在一个庞大的数据集上训练的,该数据集包含3.36亿个40%的非冗余未标记抗体序列,捕获抗体特有的独特和保守特性。值得注意的是,BALM展示了在四个抗原结合预测任务中的卓越表现。此外,我们介绍BALMFold,从BALM派生的端到端方法,能够从单个序列快速预测完整的原子抗体结构。值得注意的是,BALMFold优于那些成熟的方法,如AlphaFold2,IgFold,抗体基准中的ESMFold和OmegaFold,通过减少对不必要试验的需求,显示出促进创新工程和简化治疗性抗体开发的巨大潜力。BALMFold结构预测服务器可在https://beamlab-sh.com/models/BALMFold免费获得。
公众号