关键词: CNN Classification Exons GoogleNet Introns Restnet-50

来  源:   DOI:10.1016/j.jgeb.2024.100359   PDF(Pubmed)

Abstract:
BACKGROUND: Examining functions and characteristics of DNA sequences is a highly challenging task. When it comes to the human genome, which is made up of exons and introns, this task is more challenging. Human exons and introns contain millions to billions of nucleotides, which contributes to the complexity observed in this sequences. Considering how complicated the subject of genomics is, it is obvious that using signal processing techniques and deep learning tools to build a strong predictive model can be very helpful for the development of the research of the human genome.
RESULTS: After representing human exons and introns with color images using Frequency Chaos Game Representation, two pre-trained convolutional neural network models (Resnet-50 and GoogleNet) and a proposed CNN model having 13 hidden layers were used to classify our obtained images. We have reached a value of 92% for the accuracy rate for Resnet-50 model in about 7 h for the execution time, a value of 91.5% for the accuracy rate for the GoogleNet model in 2 h and a half for the execution time. For our proposed CNN model, we have reached 91.6% for the accuracy rate in 2 h and 37 min.
CONCLUSIONS: Our proposed CNN model is faster than the Resnet-50 model in terms of execution time. It was able to slightly exceed the GoogleNet model for the accuracy rate value.
摘要:
背景:检查DNA序列的功能和特征是一项极具挑战性的任务。说到人类基因组,由外显子和内含子组成,这项任务更具挑战性。人类外显子和内含子含有数百万到数十亿个核苷酸,这有助于在该序列中观察到的复杂性。考虑到基因组学的主题有多复杂,很明显,利用信号处理技术和深度学习工具建立一个强有力的预测模型,对人类基因组研究的发展是非常有帮助的。
结果:使用频率混沌游戏表示用彩色图像表示人类外显子和内含子后,使用两个预训练的卷积神经网络模型(Resnet-50和GoogleNet)和一个建议的具有13个隐藏层的CNN模型对我们获得的图像进行分类。我们已经达到了92%的值的准确率为Resnet-50模型在大约7小时的执行时间,GoogleNet模型在2小时半的执行时间内的准确率值为91.5%。对于我们提出的CNN模型,在2h和37min内,我们的准确率达到91.6%。
结论:我们提出的CNN模型在执行时间方面比Resnet-50模型更快。它能够稍微超过GoogleNet模型的准确率值。
公众号