关键词: machine learning neuropeptide identification protein language model

Mesh : Neuropeptides / metabolism Machine Learning Humans Support Vector Machine Computational Biology / methods Evolution, Molecular Algorithms

来  源:   DOI:10.3390/ijms25137049   PDF(Pubmed)

Abstract:
Neuropeptides are biomolecules with crucial physiological functions. Accurate identification of neuropeptides is essential for understanding nervous system regulatory mechanisms. However, traditional analysis methods are expensive and laborious, and the development of effective machine learning models continues to be a subject of current research. Hence, in this research, we constructed an SVM-based machine learning neuropeptide predictor, iNP_ESM, by integrating protein language models Evolutionary Scale Modeling (ESM) and Unified Representation (UniRep) for the first time. Our model utilized feature fusion and feature selection strategies to improve prediction accuracy during optimization. In addition, we validated the effectiveness of the optimization strategy with UMAP (Uniform Manifold Approximation and Projection) visualization. iNP_ESM outperforms existing models on a variety of machine learning evaluation metrics, with an accuracy of up to 0.937 in cross-validation and 0.928 in independent testing, demonstrating optimal neuropeptide recognition capabilities. We anticipate improved neuropeptide data in the future, and we believe that the iNP_ESM model will have broader applications in the research and clinical treatment of neurological diseases.
摘要:
神经肽是具有重要生理功能的生物分子。神经肽的准确鉴定对于理解神经系统调节机制至关重要。然而,传统的分析方法既昂贵又费力,有效的机器学习模型的开发仍然是当前研究的主题。因此,在这项研究中,我们构建了一个基于SVM的机器学习神经肽预测因子,iNP_ESM,通过整合蛋白质语言模型进化尺度建模(ESM)和统一表示(UniRep)首次。我们的模型利用特征融合和特征选择策略来提高优化过程中的预测精度。此外,我们通过UMAP(均匀流形逼近和投影)可视化验证了优化策略的有效性。iNP_ESM在各种机器学习评估指标上优于现有模型,交叉验证的准确度高达0.937,独立测试的准确度高达0.928,展示最佳的神经肽识别能力。我们预计未来神经肽数据会有所改善,我们相信iNP_ESM模型将在神经系统疾病的研究和临床治疗中具有更广阔的应用前景。
公众号