关键词: Computational model Deep learning Genetics M5c Sequence Analysis Statistical model

来  源:   DOI:10.1016/j.ymeth.2024.07.008

Abstract:
5-Methylcytosine (m5c) is a modified cytosine base which is formed as the result of addition of methyl group added at position 5 of carbon. This modification is one of the most common PTM that used to occur in almost all types of RNA. The conventional laboratory methods do not provide quick reliable identification of m5c sites. However, the sequence data readiness has made it feasible to develop computationally intelligent models that optimize the identification process for accuracy and robustness. The present research focused on the development of in-silico methods built using deep learning models. The encoded data was then fed into deep learning models, which included gated recurrent unit (GRU), long short-term memory (LSTM), and bi-directional LSTM (Bi-LSTM). After that, the models were subjected to a rigorous evaluation process that included both independent set testing and 10-fold cross validation. The results revealed that LSTM-based model, m5c-iDeep, outperformed revealing 99.9 % accuracy while comparing with existing m5c predictors. In order to facilitate researchers, m5c-iDeep was also deployed on a web-based server which is accessible at https://taseersuleman-m5c-ideep-m5c-ideep.streamlit.app/.
摘要:
5-甲基胞嘧啶(m5c)是修饰的胞嘧啶碱基,其由于在碳的5位添加甲基而形成。这种修饰是在几乎所有类型的RNA中发生的最常见的PTM之一。常规的实验室方法不能快速可靠地识别m5c位点。然而,序列数据的就绪性使得开发计算智能模型变得可行,这些模型可以优化识别过程,从而提高准确性和鲁棒性。本研究的重点是使用深度学习模型构建的计算机方法的开发。然后将编码数据输入深度学习模型,其中包括门控经常性单位(GRU),长短期记忆(LSTM),和双向LSTM(Bi-LSTM)。之后,这些模型经过严格的评估过程,包括独立的集合检验和10倍交叉验证.结果表明,基于LSTM的模型,m5c-iDeep,与现有的m5c预测因子相比,表现优于99.9%的准确率。为了方便研究人员,m5c-iDeep还部署在基于Web的服务器上,该服务器可在https://taseersuleman-m5c-ideep-m5c-ideep访问。流光。app/.
公众号