Mesh : Solvents / chemistry Neural Networks, Computer Solubility Hydrogen-Ion Concentration Machine Learning Models, Chemical Acids / chemistry

来  源:   DOI:10.1021/acs.jcim.4c00449

Abstract:
Rapid and accurate calculation of acid dissociation constant (pKa) is crucial for designing chemical synthesis routes, optimizing catalysts, and predicting chemical behavior. Despite recent progress in machine learning, predicting solvation acidity, especially in nonaqueous solvents, remains challenging due to limited experimental data. This challenge arises from treating experimental values in different solvents as distinct data domains and modeling them separately. In this work, we treat both the solutes and solvents equally from a perspective of molecular topology and propose a highly universal framework called AttenGpKa for predicting solvation acidity. AttenGpKa is trained using 26,522 experimental pKa values from 60 pure and mixed solvents in the iBonD database. As a result, our model can simultaneously predict the pKa values of a compound in various solvents, including pure water, pure nonaqueous, and mixed solvents. AttenGpKa achieves universality by using graph neural networks and attention mechanisms to learn complex effects within solute and solvent molecules. Furthermore, encodings of both solute and solvent molecules are adaptively fused to simulate the influence of the solvent on acid dissociation. AttenGpKa demonstrates robust generalization in extensive validations. The interpretability studies further indicate that our model has effectively learnt electronic and solvent effects. A free-to-use software is provided to facilitate the use of AttenGpKa for pKa prediction.
摘要:
酸解离常数(pKa)的快速准确计算对于化学合成路线的设计至关重要。优化催化剂,并预测化学行为。尽管最近在机器学习方面取得了进展,预测溶剂化酸度,特别是在非水溶剂中,由于实验数据有限,仍然具有挑战性。这种挑战来自于将不同溶剂中的实验值作为不同的数据域处理并分别对其建模。在这项工作中,我们从分子拓扑的角度平等地对待溶质和溶剂,并提出了一个高度通用的框架,称为AttenGpKa,用于预测溶剂化酸度。使用来自iBonD数据库中的60种纯溶剂和混合溶剂的26,522个实验pKa值来训练AttenGpKa。因此,我们的模型可以同时预测化合物在各种溶剂中的pKa值,包括纯净水,纯非水,和混合溶剂。AttenGpKa通过使用图神经网络和注意力机制来学习溶质和溶剂分子内的复杂效应,从而实现了普遍性。此外,溶质和溶剂分子的编码自适应融合,以模拟溶剂对酸解离的影响。AttenGpKa在广泛的验证中证明了强大的泛化能力。可解释性研究进一步表明,我们的模型有效地学习了电子和溶剂效应。提供了免费使用的软件来促进使用AttenGpKa进行pKa预测。
公众号