关键词: convolutional block attention module convolutional neural network gated recurrent unit protein language models

Mesh : Neural Networks, Computer Computational Biology / methods Humans Proteins / genetics metabolism Deep Learning Databases, Protein Algorithms Amino Acid Sequence

来  源:   DOI:10.1515/sagmb-2024-0004

Abstract:
Understanding a protein\'s function based solely on its amino acid sequence is a crucial but intricate task in bioinformatics. Traditionally, this challenge has proven difficult. However, recent years have witnessed the rise of deep learning as a powerful tool, achieving significant success in protein function prediction. Their strength lies in their ability to automatically learn informative features from protein sequences, which can then be used to predict the protein\'s function. This study builds upon these advancements by proposing a novel model: CNN-CBAM+BiGRU. It incorporates a Convolutional Block Attention Module (CBAM) alongside BiGRUs. CBAM acts as a spotlight, guiding the CNN to focus on the most informative parts of the protein data, leading to more accurate feature extraction. BiGRUs, a type of Recurrent Neural Network (RNN), excel at capturing long-range dependencies within the protein sequence, which are essential for accurate function prediction. The proposed model integrates the strengths of both CNN-CBAM and BiGRU. This study\'s findings, validated through experimentation, showcase the effectiveness of this combined approach. For the human dataset, the suggested method outperforms the CNN-BIGRU+ATT model by +1.0 % for cellular components, +1.1 % for molecular functions, and +0.5 % for biological processes. For the yeast dataset, the suggested method outperforms the CNN-BIGRU+ATT model by +2.4 % for the cellular component, +1.2 % for molecular functions, and +0.6 % for biological processes.
摘要:
在生物信息学中,仅根据其氨基酸序列了解蛋白质的功能是一项至关重要但复杂的任务。传统上,事实证明,这一挑战是困难的。然而,近年来见证了深度学习作为一种强大工具的兴起,在蛋白质功能预测方面取得了显著成功。他们的优势在于他们能够自动从蛋白质序列中学习信息特征,然后可以用来预测蛋白质的功能。这项研究建立在这些进步的基础上,提出了一个新的模型:CNN-CBAM+BiGRU。它包含一个卷积块注意模块(CBAM)与BiGRU。CBAM充当聚光灯,指导CNN专注于蛋白质数据中信息最丰富的部分,导致更准确的特征提取。BiGRU,一种循环神经网络(RNN),擅长捕捉蛋白质序列中的远程依赖关系,这对于准确的函数预测至关重要。所提出的模型整合了CNN-CBAM和BiGRU的优势。这项研究的发现,通过实验验证,展示这种组合方法的有效性。对于人类数据集,对于细胞成分,建议的方法优于CNN-BIGRU+ATT模型+1.0%,+1.1%的分子功能,生物过程+0.5%。对于酵母数据集,对于细胞成分,建议的方法优于CNN-BIGRU+ATT模型+2.4%,+1.2%的分子功能,生物过程+0.6%。
公众号