关键词: deep learning homology identification protein function prediction secondary structure sequence-based method

Mesh : Proteins / chemistry metabolism genetics Algorithms Neural Networks, Computer Protein Structure, Secondary Computational Biology / methods Databases, Protein Gene Ontology Sequence Analysis, Protein / methods Software

来  源:   DOI:10.1093/bib/bbae196   PDF(Pubmed)

Abstract:
Predicting protein function is crucial for understanding biological life processes, preventing diseases and developing new drug targets. In recent years, methods based on sequence, structure and biological networks for protein function annotation have been extensively researched. Although obtaining a protein in three-dimensional structure through experimental or computational methods enhances the accuracy of function prediction, the sheer volume of proteins sequenced by high-throughput technologies presents a significant challenge. To address this issue, we introduce a deep neural network model DeepSS2GO (Secondary Structure to Gene Ontology). It is a predictor incorporating secondary structure features along with primary sequence and homology information. The algorithm expertly combines the speed of sequence-based information with the accuracy of structure-based features while streamlining the redundant data in primary sequences and bypassing the time-consuming challenges of tertiary structure analysis. The results show that the prediction performance surpasses state-of-the-art algorithms. It has the ability to predict key functions by effectively utilizing secondary structure information, rather than broadly predicting general Gene Ontology terms. Additionally, DeepSS2GO predicts five times faster than advanced algorithms, making it highly applicable to massive sequencing data. The source code and trained models are available at https://github.com/orca233/DeepSS2GO.
摘要:
预测蛋白质功能对于理解生物生命过程至关重要,预防疾病和开发新的药物靶点。近年来,基于序列的方法,蛋白质功能注释的结构和生物网络得到了广泛的研究。尽管通过实验或计算方法获得三维结构的蛋白质可以提高功能预测的准确性,通过高通量技术测序的蛋白质的绝对体积提出了重大挑战。为了解决这个问题,我们引入了一个深度神经网络模型DeepSS2GO(二级结构到基因本体)。它是结合二级结构特征以及一级序列和同源性信息的预测器。该算法巧妙地将基于序列的信息的速度与基于结构的特征的准确性相结合,同时简化了主序列中的冗余数据,并绕过了三级结构分析的耗时挑战。结果表明,预测性能优于最先进的算法。它具有通过有效利用二级结构信息来预测关键功能的能力,而不是广泛预测一般的基因本体论术语。此外,DeepSS2GO预测比先进算法快五倍,使其高度适用于海量测序数据。源代码和经过训练的模型可在https://github.com/orca233/DeepSS2GO获得。
公众号