关键词: T cell receptor T cell repertoire TCR–pMHC pair language models of proteins protein binding

Mesh : Receptors, Antigen, T-Cell / immunology metabolism Peptides / immunology chemistry metabolism Humans Epitopes / immunology Protein Binding Epitopes, T-Lymphocyte / immunology Unsupervised Machine Learning

来  源:   DOI:10.1073/pnas.2316401121   PDF(Pubmed)

Abstract:
The accurate prediction of binding between T cell receptors (TCR) and their cognate epitopes is key to understanding the adaptive immune response and developing immunotherapies. Current methods face two significant limitations: the shortage of comprehensive high-quality data and the bias introduced by the selection of the negative training data commonly used in the supervised learning approaches. We propose a method, Transformer-based Unsupervised Language model for Interacting Peptides and T cell receptors (TULIP), that addresses both limitations by leveraging incomplete data and unsupervised learning and using the transformer architecture of language models. Our model is flexible and integrates all possible data sources, regardless of their quality or completeness. We demonstrate the existence of a bias introduced by the sampling procedure used in previous supervised approaches, emphasizing the need for an unsupervised approach. TULIP recognizes the specific TCRs binding an epitope, performing well on unseen epitopes. Our model outperforms state-of-the-art models and offers a promising direction for the development of more accurate TCR epitope recognition models.
摘要:
准确预测T细胞受体(TCR)与其同源表位之间的结合是理解适应性免疫应答和开发免疫疗法的关键。当前方法面临两个显著的局限性:全面高质量数据的短缺和监督学习方法中常用的负面训练数据的选择引入的偏差。我们提出了一种方法,用于相互作用肽和T细胞受体(TULIP)的基于变压器的无监督语言模型,通过利用不完整的数据和无监督学习以及使用语言模型的转换器架构来解决这两个限制。我们的模型灵活,集成了所有可能的数据源,无论其质量或完整性。我们证明了以前监督方法中使用的抽样程序引入的偏差的存在,强调需要一种无监督的方法。TULIP识别结合表位的特定TCR,在看不见的表位上表现良好。我们的模型优于最先进的模型,并为开发更准确的TCR表位识别模型提供了有希望的方向。
公众号