二级结构预测是了解蛋白质功能和生物学特性的关键步骤,在新药开发领域具有重要意义。疾病治疗,生物工程,等。准确预测蛋白质的二级结构有助于揭示蛋白质如何折叠以及它们在细胞中的功能。深度学习模型在蛋白质结构预测中的应用尤为重要,因为它们能够处理复杂的序列信息,提取有意义的模式和特征,从而显著提高了预测的准确性和效率。在这项研究中,集成改进的时间卷积网络(TCN)的组合模型,双向长短期记忆(BiLSTM),并提出了一种多头注意(MHA)机制,以提高八态和三态结构中蛋白质预测的准确性。结合了单热编码特征和理化性质的词向量表示。一个重要的重点是利用ProtT5预训练模型的知识蒸馏技术,导致性能改进。改进的TCN,通过多尺度融合和双向操作实现,与传统的TCN模型相比,可以更好地提取氨基酸序列特征。该模型在多个数据集上表现出优异的预测性能。对于TS115、CB513和PDB(2018-2020)数据集,本文对六个数据集的八态结构的预测精度达到88.2%,84.9%,和95.3%,分别,三态结构的预测精度达到91.3%,90.3%,和96.8%,分别。本研究不仅提高了蛋白质二级结构预测的准确性,而且为了解蛋白质的结构和功能提供了重要的工具。它特别适用于资源受限的环境,并为理解蛋白质结构和功能提供了有价值的工具。
Secondary structure prediction is a key step in understanding protein function and biological properties and is highly important in the fields of new drug development, disease treatment, bioengineering, etc. Accurately predicting the secondary structure of proteins helps to reveal how proteins are folded and how they function in cells. The application of deep learning models in protein structure prediction is particularly important because of their ability to process complex sequence information and extract meaningful patterns and features, thus significantly improving the accuracy and efficiency of prediction. In this study, a combined model integrating an improved temporal convolutional network (TCN), bidirectional long short-term memory (BiLSTM), and a multi-head attention (MHA) mechanism is proposed to enhance the accuracy of protein prediction in both eight-state and three-state structures. One-hot encoding features and word vector representations of physicochemical properties are incorporated. A significant emphasis is placed on knowledge distillation techniques utilizing the ProtT5 pretrained model, leading to performance improvements. The improved TCN, achieved through multiscale fusion and bidirectional operations, allows for better extraction of amino acid sequence features than traditional TCN models. The model demonstrated excellent prediction performance on multiple datasets. For the TS115, CB513 and PDB (2018-2020) datasets, the prediction accuracy of the eight-state structure of the six datasets in this paper reached 88.2%, 84.9%, and 95.3%, respectively, and the prediction accuracy of the three-state structure reached 91.3%, 90.3%, and 96.8%, respectively. This study not only improves the accuracy of protein secondary structure prediction but also provides an important tool for understanding protein structure and function, which is particularly applicable to resource-constrained contexts and provides a valuable tool for understanding protein structure and function.