Mesh : Benchmarking Drug Discovery Reading Frames

来  源:   DOI:10.1093/bioinformatics/btae115   PDF(Pubmed)

Abstract:
BACKGROUND: Retrosynthesis is a critical task in drug discovery, aimed at finding a viable pathway for synthesizing a given target molecule. Many existing approaches frame this task as a graph-generating problem. Specifically, these methods first identify the reaction center, and break a targeted molecule accordingly to generate the synthons. Reactants are generated by either adding atoms sequentially to synthon graphs or by directly adding appropriate leaving groups. However, both of these strategies have limitations. Adding atoms results in a long prediction sequence that increases the complexity of generation, while adding leaving groups only considers those in the training set, which leads to poor generalization.
RESULTS: In this paper, we propose a novel end-to-end graph generation model for retrosynthesis prediction, which sequentially identifies the reaction center, generates the synthons, and adds motifs to the synthons to generate reactants. Given that chemically meaningful motifs fall between the size of atoms and leaving groups, our model achieves lower prediction complexity than adding atoms and demonstrates superior performance than adding leaving groups. We evaluate our proposed model on a benchmark dataset and show that it significantly outperforms previous state-of-the-art models. Furthermore, we conduct ablation studies to investigate the contribution of each component of our proposed model to the overall performance on benchmark datasets. Experiment results demonstrate the effectiveness of our model in predicting retrosynthesis pathways and suggest its potential as a valuable tool in drug discovery.
METHODS: All code and data are available at https://github.com/szu-ljh2020/MARS.
摘要:
背景:反合成是药物发现中的一项关键任务,旨在找到合成给定靶分子的可行途径。许多现有方法将此任务框为图生成问题。具体来说,这些方法首先确定反应中心,并相应地破坏目标分子以产生合成子。通过将原子依次添加到合成图上或通过直接添加适当的离去基团来产生反应物。然而,这两种策略都有局限性。添加原子导致长的预测序列,这增加了生成的复杂性,虽然添加离开组只考虑训练集中的那些,这导致了糟糕的概括。
结果:在本文中,我们提出了一种新颖的端到端图生成模型,用于逆向预测,依次识别反应中心,产生合成子,并向合成子添加基序以产生反应物。鉴于化学上有意义的基序介于原子的大小和离去基团之间,与添加原子相比,我们的模型实现了更低的预测复杂度,并且表现出比添加离去基团更好的性能。我们在基准数据集上评估了我们提出的模型,并表明它明显优于以前的最新模型。此外,在基准数据集上,我们进行了消融研究,以调查我们提出的模型的每个组成部分对整体性能的贡献.实验结果证明了我们的模型在预测反合成途径方面的有效性,并表明其作为药物发现中有价值的工具的潜力。
方法:所有代码和数据可在https://github.com/szu-ljh2020/MARS获得。
公众号