关键词: Alkylated polycyclic aromatic compounds Coking wastewater Heterocyclic Machine learning Polycyclic aromatic compounds Support Vector Machine

Mesh : Wastewater / chemistry Machine Learning Polycyclic Aromatic Hydrocarbons / analysis Water Pollutants, Chemical / analysis Coke Environmental Monitoring / methods Waste Disposal, Fluid / methods Industrial Waste / analysis

来  源:   DOI:10.1016/j.chemosphere.2024.142476

Abstract:
Organic contaminants such as polycyclic aromatic compounds (PACs) occurring in industrial effluents can not only persist in wastewater but transform into more toxic and mobile, substituted heterocyclic products during treatment. Thus, predicting the occurrence of PACs and their heterocyclic derivatives (HPACs) in coking wastewater is of utmost importance to reduce the environmental risks in water bodies that receive industrial effluents. Although HPACs can be monitored through sampling and analysis, the characterisation techniques used in their analyses are costly and time-consuming. In this study, we propose 3 distinct kernel-based machine learning (ML) models for predicting PACs including substituted HPACs and alkylated PACs occurring in coking wastewater. By using routinely measured wastewater quality data as input for our models, we predicted the occurrence of 14 HPACs in the final effluent of a coking wastewater treatment plant. Support Vector Machine based regression model (SVR) used for HPAC prediction showed the highest R2 of 0.83. Performance assessment of SVR model showed a mean absolute logarithmic error (MALE) of 0.46 and root mean square error (RMSE) of 0.073 ng/L. Comparatively, K-Nearest Neighbor and Random Forest models showed lower R2 of 0.75 and 0.76 respectively for HPAC prediction. Feature analysis attributed the superior predictability of SVR model likely to its higher weightage (81%) towards dissolved organic carbon and total ammonia as input variables. Both these variables could capture the underlying secondary PAC transformations likely occurring in the treatment plant. Partial dependence plots predicted that ammonia levels higher than 120 mg/L and DOC levels of 50-60 mg/L were likely linked to higher HPACs occurring in the final effluent. This work highlights the capability of kernel-based ML models in capturing nonlinear wastewater chemistry and offers a tool for monitoring trace organic contaminants released in coking effluents.
摘要:
工业废水中存在的有机污染物如多环芳香族化合物(PAC)不仅可以在废水中持续存在,而且在其处理过程中会转化为更具毒性和流动性的取代杂环产物。因此,预测焦化废水中PACs及其杂环衍生物(HPACs)的发生对降低水体的环境风险至关重要。虽然可以通过采样和分析来监测HPAC,他们的表征技术是昂贵和耗时的。在这项研究中,我们提出了3种不同的基于内核的机器学习(ML)模型,用于预测焦化废水中发生的PAC,包括取代的HPAC和烷基化的PAC。通过使用常规测量的废水质量数据,作为我们模型的输入,我们预测最终流出物中存在14种HPACs,R2为0.83。基于支持向量机(SVR)的回归模型的进一步性能评估显示,对数误差(MAE)为0.46,平方误差(RMSE)为0.073ng/L。相对而言,K最近邻模型和随机森林模型分别显示用于HPAC预测的R2为0.75和0.76。通过特征分析进行的进一步模型探索表明,SVR模型的优越可预测性是基于其较高的权重(81%)对溶解有机碳和总氨的输入变量,可以捕获可能发生在处理厂中的潜在二次转化。基于部分依赖图,高于120mg/L的氨水平和50-60mg/L的DOC水平表明焦化流出物中溶解的HPAC较高。这项工作突出了基于内核的ML模型在捕获非线性废水化学方面的能力,并提供了一种用于监测焦化废水废水中释放的痕量有机污染物的工具。
公众号