关键词: DenseNet Encoding Global attention mechanism Promoter Promoter strength

来  源:   DOI:10.1016/j.heliyon.2024.e27364   PDF(Pubmed)

Abstract:
The promoter is a key DNA sequence whose primary function is to control the initiation time and the degree of expression of gene transcription. Accurate identification of promoters is essential for understanding gene expression studies. Traditional sequencing techniques for identifying promoters are costly and time-consuming. Therefore, the development of computational methods to identify promoters has become critical. Since deep learning methods show great potential in identifying promoters, this study proposes a new promoter prediction model, called iPro2L-DG. The iPro2L-DG predictor, based on an improved Densely Connected Convolutional Network (DenseNet) and a Global Attention Mechanism (GAM), is constructed to achieve the prediction of promoters. The promoter sequences are combined feature encoding using C2 encoding and nucleotide chemical property (NCP) encoding. An improved DenseNet extracts advanced feature information from the combined feature encoding. GAM evaluates the importance of advanced feature information in terms of channel and spatial dimensions, and finally uses a Full Connect Neural Network (FNN) to derive prediction probabilities. The experimental results showed that the accuracy of iPro2L-DG in the first layer (promoter identification) was 94.10% with Matthews correlation coefficient value of 0.8833. In the second layer (promoter strength prediction), the accuracy was 89.42% with Matthews correlation coefficient value of 0.7915. The iPro2L-DG predictor significantly outperforms other existing predictors in promoter identification and promoter strength prediction. Therefore, our proposed model iPro2L-DG is the most advanced promoter prediction tool. The source code of the iPro2L-DG model can be found in https://github.com/leirufeng/iPro2L-DG.
摘要:
启动子是关键的DNA序列,其主要功能是控制基因转录的起始时间和表达程度。启动子的准确鉴定对于理解基因表达研究至关重要。用于鉴定启动子的传统测序技术是昂贵且耗时的。因此,识别启动子的计算方法的发展已经变得至关重要。由于深度学习方法在识别启动子方面显示出巨大的潜力,本研究提出了一种新的启动子预测模型,名为iPro2L-DG。iPro2L-DG预测器,基于改进的密集连接卷积网络(DenseNet)和全球注意力机制(GAM),是为了实现对启动子的预测而构建的。使用C2编码和核苷酸化学性质(NCP)编码组合特征编码的启动子序列。改进的DenseNet从组合特征编码中提取高级特征信息。GAM在通道和空间维度方面评估高级特征信息的重要性,最后使用全连接神经网络(FNN)来推导预测概率。实验结果表明,iPro2L-DG在第1层(启动子识别)的准确率为94.10%,Matthews相关系数为0.8833。在第二层(启动子强度预测)中,准确度为89.42%,Matthews相关系数值为0.7915。iPro2L-DG预测因子在启动子识别和启动子强度预测方面显著优于其他现有预测因子。因此,我们提出的iPro2L-DG模型是最先进的启动子预测工具。iPro2L-DG型号的源代码可在https://github.com/leirufeng/iPro2L-DG中找到。
公众号