关键词: Artificial Intelligence Dimensionality Reduction Enhanced Sampling Linear Discriminant Analysis Machine Learning Neural Networks

来  源:   DOI:10.1016/j.bpj.2024.06.024

Abstract:
Biomolecules often exhibit complex free energy landscapes in which long-lived metastable states are separated by large energy barriers. Overcoming these barriers to robustly sample transitions between the metastable states with classical molecular dynamics (MD) simulations presents a challenge. To circumvent this issue, collective variable (CV)-based enhanced sampling MD approaches are often employed. Traditional CV selection relies on intuition and prior knowledge of the system. This approach introduces bias, which can lead to incomplete mechanistic insights. Thus, automated CV detection is desired to gain a deeper understanding of the system/process. Analysis of MD data with various machine learning algorithms, such as Principal Component Analysis (PCA), Support Vector Machine (SVM), and Linear Discriminant Analysis (LDA)-based approaches have been implemented for automated CV detection. However, their performance has not been systematically evaluated on structurally and mechanistically complex biological systems. Here, we applied these methods to MD simulations of the MFSD2A (Major Facilitator Superfamily Domain 2A) lysolipid transporter in multiple functionally relevant metastable states with the goal of identifying optimal CVs that would structurally discriminate these states. Specific emphasis was on the automated detection and interpretive power of LDA-based CVs. We found that LDA methods, which included a novel gradient descent-based multiclass harmonic variant, termed GDHLDA, we developed here, outperform PCA in class separation, exhibiting remarkable consistency in extracting CVs critical for distinguishing metastable states. Furthermore, the identified CVs included features previously associated with conformational transitions in MFSD2A. Specifically, conformational shifts in transmembrane helix 7 and in residue Y294 on this helix emerged as critical features discriminating the metastable states in MFSD2A. This highlights the effectiveness of LDA-based approaches in automatically extracting from MD trajectories CVs of functional relevance that can be used to drive biased MD simulations to efficiently sample conformational transitions in the molecular system.
摘要:
生物分子通常表现出复杂的自由能景观,其中长寿命的亚稳态被大的能量屏障隔开。通过经典分子动力学(MD)模拟克服亚稳态之间的稳健样品跃迁的这些障碍提出了挑战。为了避免这个问题,通常采用基于集体变量(CV)的增强采样MD方法。传统的CV选择依赖于系统的直觉和先验知识。这种方法引入了偏见,这可能导致不完整的机械见解。因此,需要自动CV检测以更深入地了解系统/过程。使用各种机器学习算法分析MD数据,如主成分分析(PCA),支持向量机(SVM)和基于线性判别分析(LDA)的方法已实现用于自动CV检测。然而,它们的性能尚未在结构和机械上复杂的生物系统上进行系统评估。这里,我们将这些方法应用于在多个功能相关的亚稳态中的MFSD2A(主要促进者超家族域2A)溶血脂转运蛋白的MD模拟,目的是确定可以在结构上区分这些状态的最佳CV。特别强调基于LDA的CV的自动检测和解释能力。我们发现LDA方法,其中包括一个新颖的基于梯度下降的多类谐波变体,称为GDHLDA,我们在这里开发的,在类分离方面优于PCA,在提取区分亚稳态的关键CV方面表现出显著的一致性。此外,鉴定的CV包括以前与MFSD2A构象转变相关的特征。具体来说,跨膜螺旋7和该螺旋上的残基Y294的构象变化是区分MFSD2A中亚稳态的关键特征。这突出了基于LDA的方法在从MD轨迹中自动提取功能相关性的CV方面的有效性,这些CV可用于驱动偏置的MD模拟,以有效地对分子系统中的构象转变进行采样。
公众号