背景:RNA设计在合成生物学和治疗学中的应用越来越多,由RNA在各种生物过程中的关键作用驱动。一个基本的挑战是找到满足给定结构约束的功能性RNA序列,称为逆折叠问题。已经出现了基于二级结构的计算方法来解决这个问题。然而,直接从3D结构设计RNA序列仍然具有挑战性,由于数据的稀缺性,非唯一的结构-序列映射,和RNA构象的灵活性。
结果:在这项研究中,我们提出了核扩散,用于RNA反向折叠的生成扩散模型,可以学习给定3D主链结构的RNA序列的条件分布。我们的模型由基于图神经网络的结构模块和基于Transformer的序列模块组成,迭代地将随机序列转换为期望的序列。通过调整采样重量,我们的模型允许在序列恢复和多样性之间进行权衡,以探索更多的候选.我们基于RNA聚类对测试集进行拆分,对序列或结构相似性具有不同的截止值。我们的模型在序列恢复方面优于基线,序列相似性分裂平均相对提高11%,结构相似性分裂平均提高16%。此外,核扩散在各种RNA长度类别和RNA类型中表现一致。我们还应用计算机折叠来验证生成的序列是否可以折叠到给定的3DRNA主链中。我们的方法可能是RNA设计的强大工具,可以探索广阔的序列空间并找到3D结构约束的新颖解决方案。
方法:源代码可在https://github.com/ml4bio/RiboDiffusion获得。
BACKGROUND: RNA design shows growing applications in synthetic biology and therapeutics, driven by the crucial role of RNA in various biological processes. A fundamental challenge is to find functional RNA sequences that satisfy given structural constraints, known as the inverse folding problem. Computational approaches have emerged to address this problem based on secondary structures. However, designing RNA sequences directly from 3D structures is still challenging, due to the scarcity of data, the nonunique structure-sequence mapping, and the flexibility of RNA conformation.
RESULTS: In this study, we propose RiboDiffusion, a generative diffusion model for RNA inverse folding that can learn the conditional distribution of RNA sequences given 3D backbone structures. Our model consists of a graph neural network-based structure module and a Transformer-based sequence module, which iteratively transforms random sequences into desired sequences. By tuning the sampling weight, our model allows for a trade-off between sequence recovery and diversity to explore more candidates. We split test sets based on RNA clustering with different cut-offs for sequence or structure similarity. Our model outperforms baselines in sequence recovery, with an average relative improvement of 11% for sequence similarity splits and 16% for structure similarity splits. Moreover, RiboDiffusion performs consistently well across various RNA length categories and RNA types. We also apply in silico folding to validate whether the generated sequences can fold into the given 3D RNA backbones. Our method could be a powerful tool for RNA design that explores the vast sequence space and finds novel solutions to 3D structural constraints.
METHODS: The source code is available at https://github.com/ml4bio/RiboDiffusion.