关键词: IDP IDR deep learning disorder machine learning protein language model protein structure prediction

Mesh : Intrinsically Disordered Proteins / chemistry metabolism Databases, Protein Models, Molecular Computational Biology / methods Protein Conformation Molecular Sequence Annotation Algorithms

来  源:   DOI:10.1016/j.str.2024.04.010

Abstract:
Despite their lack of a rigid structure, intrinsically disordered regions (IDRs) in proteins play important roles in cellular functions, including mediating protein-protein interactions. Therefore, it is important to computationally annotate IDRs with high accuracy. In this study, we present Disordered Region prediction using Bidirectional Encoder Representations from Transformers (DR-BERT), a compact protein language model. Unlike most popular tools, DR-BERT is pretrained on unannotated proteins and trained to predict IDRs without relying on explicit evolutionary or biophysical data. Despite this, DR-BERT demonstrates significant improvement over existing methods on the Critical Assessment of protein Intrinsic Disorder (CAID) evaluation dataset and outperforms competitors on two out of four test cases in the CAID 2 dataset, while maintaining competitiveness in the others. This performance is due to the information learned during pretraining and DR-BERT\'s ability to use contextual information.
摘要:
尽管它们缺乏刚性结构,蛋白质中的内在无序区域(IDR)在细胞功能中起重要作用,包括介导蛋白质-蛋白质相互作用。因此,以高精度计算注释IDR是很重要的。在这项研究中,我们使用来自变压器的双向编码器表示(DR-BERT)来预测无序区域,紧凑的蛋白质语言模型。与大多数流行的工具不同,DR-BERT在未注释的蛋白质上进行预训练,并训练以预测IDR,而不依赖于明确的进化或生物物理数据。尽管如此,DR-BERT在蛋白质内在障碍的关键评估(CAID)评估数据集上证明了对现有方法的显着改进,并且在CAID2数据集中的四个测试用例中的两个中胜过竞争对手。同时保持在其他领域的竞争力。这种表现是由于在预训练期间学习的信息和DR-BERT使用上下文信息的能力。
公众号