关键词: RNA secondary structure end-to-end learning short-long range interactions

Mesh : Nucleic Acid Conformation RNA / chemistry genetics Deep Learning Computational Biology / methods Algorithms Neural Networks, Computer Thermodynamics

来  源:   DOI:10.1093/bib/bbae271   PDF(Pubmed)

Abstract:
BACKGROUND: Coding and noncoding RNA molecules participate in many important biological processes. Noncoding RNAs fold into well-defined secondary structures to exert their functions. However, the computational prediction of the secondary structure from a raw RNA sequence is a long-standing unsolved problem, which after decades of almost unchanged performance has now re-emerged due to deep learning. Traditional RNA secondary structure prediction algorithms have been mostly based on thermodynamic models and dynamic programming for free energy minimization. More recently deep learning methods have shown competitive performance compared with the classical ones, but there is still a wide margin for improvement.
RESULTS: In this work we present sincFold, an end-to-end deep learning approach, that predicts the nucleotides contact matrix using only the RNA sequence as input. The model is based on 1D and 2D residual neural networks that can learn short- and long-range interaction patterns. We show that structures can be accurately predicted with minimal physical assumptions. Extensive experiments were conducted on several benchmark datasets, considering sequence homology and cross-family validation. sincFold was compared with classical methods and recent deep learning models, showing that it can outperform the state-of-the-art methods.
摘要:
背景:编码和非编码RNA分子参与许多重要的生物学过程。非编码RNA折叠成明确定义的二级结构以发挥其功能。然而,从原始RNA序列对二级结构的计算预测是一个长期未解决的问题,在经历了几十年几乎不变的性能之后,由于深度学习,现在又重新出现了。传统的RNA二级结构预测算法大多基于热力学模型和动态规划来实现自由能最小化。最近,与经典的方法相比,深度学习方法已经显示出竞争力。但仍有很大的改善空间。
结果:在这项工作中,一种端到端的深度学习方法,仅使用RNA序列作为输入来预测核苷酸接触矩阵。该模型基于1D和2D残差神经网络,可以学习短期和长期的交互模式。我们证明了结构可以用最少的物理假设来准确预测。在几个基准数据集上进行了广泛的实验,考虑序列同源性和交叉家族验证。将其与经典方法和最近的深度学习模型进行了比较,表明它可以胜过最先进的方法。
公众号