关键词: AlphaFold2 Explainable machine learning Oversampling Peptide design Protein structure prediction Structural bias

Mesh : Machine Learning Antimicrobial Peptides / chemistry Drug Discovery / methods Protein Conformation, alpha-Helical Models, Molecular

来  源:   DOI:10.1038/s41598-024-62419-y   PDF(Pubmed)

Abstract:
Machine learning models are revolutionizing our approaches to discovering and designing bioactive peptides. These models often need protein structure awareness, as they heavily rely on sequential data. The models excel at identifying sequences of a particular biological nature or activity, but they frequently fail to comprehend their intricate mechanism(s) of action. To solve two problems at once, we studied the mechanisms of action and structural landscape of antimicrobial peptides as (i) membrane-disrupting peptides, (ii) membrane-penetrating peptides, and (iii) protein-binding peptides. By analyzing critical features such as dipeptides and physicochemical descriptors, we developed models with high accuracy (86-88%) in predicting these categories. However, our initial models (1.0 and 2.0) exhibited a bias towards α-helical and coiled structures, influencing predictions. To address this structural bias, we implemented subset selection and data reduction strategies. The former gave three structure-specific models for peptides likely to fold into α-helices (models 1.1 and 2.1), coils (1.3 and 2.3), or mixed structures (1.4 and 2.4). The latter depleted over-represented structures, leading to structure-agnostic predictors 1.5 and 2.5. Additionally, our research highlights the sensitivity of important features to different structure classes across models.
摘要:
机器学习模型正在彻底改变我们发现和设计生物活性肽的方法。这些模型通常需要蛋白质结构意识,因为他们严重依赖顺序数据。这些模型擅长识别特定生物学性质或活性的序列,但他们往往无法理解其复杂的行动机制。要同时解决两个问题,我们研究了抗菌肽作为(i)膜破坏肽的作用机制和结构景观,(ii)膜穿透性肽,和(iii)蛋白结合肽。通过分析关键特征,如二肽和物理化学描述符,我们开发了预测这些类别的高精度模型(86-88%).然而,我们的初始模型(1.0和2.0)表现出倾向于α-螺旋和盘绕结构,影响预测。为了解决这种结构偏差,我们实施了子集选择和数据缩减策略。前者给出了三种可能折叠成α螺旋的肽的结构特异性模型(模型1.1和2.1),线圈(1.3和2.3),或混合结构(1.4和2.4)。后者耗尽了过度代表的结构,导致结构不可知的预测因子1.5和2.5。此外,我们的研究强调了重要特征对不同模型结构类别的敏感性。
公众号