通过大型语言模型提高生成对抗网络的抗体优化能力。Improving antibody optimization ability of generative adversarial network through large language model.-医云文献数字医云科研云海量医学决策数据服务

Abstract：

Generative adversarial networks (GANs) have successfully generated functional protein sequences. However, traditional GANs often suffer from inherent randomness, resulting in a lower probability of obtaining desirable sequences. Due to the high cost of wet-lab experiments, the main goal of computer-aided antibody optimization is to identify high-quality candidate antibodies from a large range of possibilities, yet improving the ability of GANs to generate these desired antibodies is a challenge. In this study, we propose and evaluate a new GAN called the Language Model Guided Antibody Generative Adversarial Network (AbGAN-LMG). This GAN uses a language model as an input, harnessing such models\' powerful representational capabilities to improve the GAN\'s generation of high-quality antibodies. We conducted a comprehensive evaluation of the antibody libraries and sequences generated by AbGAN-LMG for COVID-19 (SARS-CoV-2) and Middle East Respiratory Syndrome (MERS-CoV). Results indicate that AbGAN-LMG has learned the fundamental characteristics of antibodies and that it improved the diversity of the generated libraries. Additionally, when generating sequences using AZD-8895 as the target antibody for optimization, over 50% of the generated sequences exhibited better developability than AZD-8895 itself. Through molecular docking, we identified 70 antibodies that demonstrated higher affinity for the wild-type receptor-binding domain (RBD) of SARS-CoV-2 compared to AZD-8895. In conclusion, AbGAN-LMG demonstrates that language models used in conjunction with GANs can enable the generation of higher-quality libraries and candidate sequences, thereby improving the efficiency of antibody optimization. AbGAN-LMG is available at http://39.102.71.224:88/.

摘要：

生成对抗网络(GAN)已经成功地产生了功能性蛋白质序列。然而,传统的GAN经常遭受固有的随机性，导致获得所需序列的概率较低。由于湿实验室实验的高成本,计算机辅助抗体优化的主要目标是从各种可能性中识别出高质量的候选抗体，然而,提高GANs产生这些所需抗体的能力是一个挑战。在这项研究中,我们提出并评估了一种新的GAN，称为语言模型引导的抗体生成对抗网络（AbGAN-LMG）。这个GAN使用语言模型作为输入，利用这些模型的强大代表性能力来提高GAN的高质量抗体的生成。我们对AbGAN-LMG针对COVID-19（SARS-CoV-2）和中东呼吸综合征（MERS-CoV）产生的抗体文库和序列进行了全面评估。结果表明,AbGAN-LMG已经了解了抗体的基本特征,并且它改善了产生的文库的多样性。此外,当使用AZD-8895作为优化的目标抗体生成序列时，超过50%的生成序列表现出比AZD-8895本身更好的显影性。通过分子对接,与AZD-8895相比,我们鉴定出70种抗体对SARS-CoV-2的野生型受体结合域(RBD)具有更高的亲和力.总之,AbGAN-LMG表明，与GAN结合使用的语言模型可以生成更高质量的文库和候选序列，从而提高了抗体优化的效率。AbGAN-LMG可在http://39.102.71.224:88/获得。