关键词: Bioinformatics Essential genes Gene sequences Graphical convolutional neural networks Machine learning

Mesh : Genes, Essential Neural Networks, Computer Algorithms Entropy Genomics

来  源:   DOI:10.1186/s12864-024-09958-w   PDF(Pubmed)

Abstract:
BACKGROUND: Essential genes encode functions that play a vital role in the life activities of organisms, encompassing growth, development, immune system functioning, and cell structure maintenance. Conventional experimental techniques for identifying essential genes are resource-intensive and time-consuming, and the accuracy of current machine learning models needs further enhancement. Therefore, it is crucial to develop a robust computational model to accurately predict essential genes.
RESULTS: In this study, we introduce GCNN-SFM, a computational model for identifying essential genes in organisms, based on graph convolutional neural networks (GCNN). GCNN-SFM integrates a graph convolutional layer, a convolutional layer, and a fully connected layer to model and extract features from gene sequences of essential genes. Initially, the gene sequence is transformed into a feature map using coding techniques. Subsequently, a multi-layer GCN is employed to perform graph convolution operations, effectively capturing both local and global features of the gene sequence. Further feature extraction is performed, followed by integrating convolution and fully-connected layers to generate prediction results for essential genes. The gradient descent algorithm is utilized to iteratively update the cross-entropy loss function, thereby enhancing the accuracy of the prediction results. Meanwhile, model parameters are tuned to determine the optimal parameter combination that yields the best prediction performance during training.
CONCLUSIONS: Experimental evaluation demonstrates that GCNN-SFM surpasses various advanced essential gene prediction models and achieves an average accuracy of 94.53%. This study presents a novel and effective approach for identifying essential genes, which has significant implications for biology and genomics research.
摘要:
背景:必需基因编码的功能在生物体的生命活动中起着至关重要的作用,包括增长,发展,免疫系统功能,和细胞结构维护。传统的鉴定必需基因的实验技术是资源密集型和耗时的,当前机器学习模型的准确性需要进一步提高。因此,开发一个稳健的计算模型来准确预测必需基因是至关重要的。
结果:在这项研究中,我们介绍GCNN-SFM,用于识别生物体中必需基因的计算模型,基于图卷积神经网络(GCNN)。GCNN-SFM集成了一个图卷积层,卷积层,和一个完全连接的层,用于从必需基因的基因序列中建模和提取特征。最初,使用编码技术将基因序列转化为特征图。随后,多层GCN用于执行图卷积运算,有效地捕获基因序列的局部和全局特征。进行进一步的特征提取,然后整合卷积层和完全连接层,以生成必需基因的预测结果。利用梯度下降算法迭代更新交叉熵损失函数,提高了预测结果的准确性。同时,模型参数进行调整,以确定在训练过程中产生最佳预测性能的最佳参数组合。
结论:实验评估表明,GCNN-SFM超越了各种高级必需基因预测模型,平均准确率为94.53%。这项研究提出了一种新的和有效的方法来识别必需基因,这对生物学和基因组学研究具有重要意义。
公众号