关键词: Machine learning Standard vaporization enthalpy Supervised learning Thermochemical predictions VOC

Mesh : Volatile Organic Compounds / analysis chemistry Machine Learning Thermodynamics Volatilization Algorithms Models, Chemical

来  源:   DOI:10.1016/j.chemosphere.2024.142257

Abstract:
The accurate prediction of standard vaporization enthalpy (ΔvapHm°) for volatile organic compounds (VOCs) is of paramount importance in environmental chemistry, industrial applications and regulatory compliance. To overcome traditional experimental methods for predicting ΔvapHm° of VOCs, machine learning (ML) models enable a high-throughput, cost-effective property estimation. But despite a rising momentum, existing ML algorithms still present limitations in prediction accuracy and broad chemical applications. In this work, we present a data driven, explainable supervised ML model to predict ΔvapHm° of VOCs. The model was built on an established experimental database of 2410 unique molecules and 223 VOCs categorized by chemical groups. Using supervised ML regression algorithms, the Random Forest successfully predicted VOCs\' ΔvapHm° with a mean absolute error of 3.02 kJ mol-1 and a 95% test score. The model was successfully validated through the prediction of ΔvapHm° for a known database of VOCs and through molecular group hold-out tests. Through chemical feature importance analysis, this explainable model revealed that VOC polarizability, connectivity indexes and electrotopological state are key for the model\'s prediction accuracy. We thus present a replicable and explainable model, which can be further expanded towards the prediction of other thermodynamic properties of VOCs.
摘要:
准确预测挥发性有机化合物(VOCs)的标准汽化焓(ΔvapHm°)在环境化学中至关重要。工业应用和法规遵从性。为了克服传统的预测VOCsΔvapHm°的实验方法,机器学习(ML)模型实现了高吞吐量,具有成本效益的财产估计。但是,尽管势头上升,现有的ML算法在预测精度和广泛的化学应用方面仍然存在局限性。在这项工作中,我们提出了一个数据驱动,可解释的监督ML模型预测VOCs的ΔvapHm°。该模型建立在已建立的2410个独特分子和223个按化学基团分类的VOC的实验数据库上。使用有监督的ML回归算法,随机森林成功预测了VOCs\'ΔvapHm°,平均绝对误差为3.02kJmol-1,测试分数为94%。通过对已知的VOC数据库的ΔvapHm°的预测以及通过分子群保持测试,成功地验证了该模型。通过化学特征重要性分析,这个可解释的模型揭示了VOC极化率,连通性指数和电拓扑状态是模型预测精度的关键。因此,我们提出了一个可复制和可解释的模型,可以进一步扩展到对VOCs其他热力学性质的预测。
公众号