关键词: Molecular property prediction SMILES benchmarking chemical language models deep learning domain knowledge fine-tuning pretraining sequence-based chemical models systematic analysis transformers

Mesh : Machine Learning Drug Discovery / methods Deep Learning

来  源:   DOI:10.1021/acs.jcim.4c00747

Abstract:
Molecular Property Prediction (MPP) is vital for drug discovery, crop protection, and environmental science. Over the last decades, diverse computational techniques have been developed, from using simple physical and chemical properties and molecular fingerprints in statistical models and classical machine learning to advanced deep learning approaches. In this review, we aim to distill insights from current research on employing transformer models for MPP. We analyze the currently available models and explore key questions that arise when training and fine-tuning a transformer model for MPP. These questions encompass the choice and scale of the pretraining data, optimal architecture selections, and promising pretraining objectives. Our analysis highlights areas not yet covered in current research, inviting further exploration to enhance the field\'s understanding. Additionally, we address the challenges in comparing different models, emphasizing the need for standardized data splitting and robust statistical analysis.
摘要:
分子性质预测(MPP)对于药物发现至关重要,作物保护,和环境科学。在过去的几十年里,已经开发了各种各样的计算技术,从在统计模型和经典机器学习中使用简单的物理和化学性质以及分子指纹到高级深度学习方法。在这次审查中,我们的目标是从当前关于采用变压器模型进行MPP的研究中提取见解。我们分析了当前可用的模型,并探讨了在为MPP训练和微调变压器模型时出现的关键问题。这些问题包括预训练数据的选择和规模,最优架构选择,和有前途的培训前目标。我们的分析突出了当前研究尚未涵盖的领域,邀请进一步探索,以增进对该领域的理解。此外,我们应对比较不同模型的挑战,强调需要标准化的数据拆分和稳健的统计分析。
公众号