关键词: RNA binding protein (RBP) convolutional neural network (CNN) double-stranded DNA binding protein (DSB) long short-term memory (LSTM) multi-class model single-stranded DNA binding protein (SSB)

Mesh : Deep Learning RNA-Binding Proteins / metabolism DNA-Binding Proteins / metabolism Computational Biology / methods Neural Networks, Computer Humans

来  源:   DOI:10.1093/bib/bbae285   PDF(Pubmed)

Abstract:
Nucleic acid-binding proteins (NABPs), including DNA-binding proteins (DBPs) and RNA-binding proteins (RBPs), play important roles in essential biological processes. To facilitate functional annotation and accurate prediction of different types of NABPs, many machine learning-based computational approaches have been developed. However, the datasets used for training and testing as well as the prediction scopes in these studies have limited their applications. In this paper, we developed new strategies to overcome these limitations by generating more accurate and robust datasets and developing deep learning-based methods including both hierarchical and multi-class approaches to predict the types of NABPs for any given protein. The deep learning models employ two layers of convolutional neural network and one layer of long short-term memory. Our approaches outperform existing DBP and RBP predictors with a balanced prediction between DBPs and RBPs, and are more practically useful in identifying novel NABPs. The multi-class approach greatly improves the prediction accuracy of DBPs and RBPs, especially for the DBPs with ~12% improvement. Moreover, we explored the prediction accuracy of single-stranded DNA binding proteins and their effect on the overall prediction accuracy of NABP predictions.
摘要:
核酸结合蛋白(NABP),包括DNA结合蛋白(DBPs)和RNA结合蛋白(RBPs),在基本的生物过程中发挥重要作用。为了便于对不同类型的NABP进行功能注释和准确预测,已经开发了许多基于机器学习的计算方法。然而,这些研究中用于训练和测试的数据集以及预测范围限制了它们的应用。在本文中,我们开发了新策略来克服这些限制,方法是生成更准确和可靠的数据集,并开发基于深度学习的方法,包括分层和多类方法来预测任何给定蛋白质的NABP类型。深度学习模型采用两层卷积神经网络和一层长短期记忆。我们的方法优于现有的DBP和RBP预测因子,在DBP和RBP之间实现了平衡预测,并且在识别新型NABP时更实用。多类方法大大提高了DBPs和RBPs的预测精度,特别是对于提高~12%的DBPs。此外,我们探讨了单链DNA结合蛋白的预测准确性及其对NABP预测的总体预测准确性的影响.
公众号