关键词: Deep learning Fluoride Groundwater Heavy metal contamination Prediction Türkiye

Mesh : Groundwater / chemistry Water Pollutants, Chemical / analysis Fluorides / analysis Deep Learning Environmental Monitoring / methods Turkey Cities

来  源:   DOI:10.1007/s11356-024-34194-w   PDF(Pubmed)

Abstract:
Groundwater resources in Bitlis province and its surroundings in Türkiye\'s Eastern Anatolia Region are pivotal for drinking water, yet they face a significant threat from fluoride contamination, compounded by the region\'s volcanic rock structure. To address this concern, fluoride levels were meticulously measured at 30 points in June 2019 dry period and September 2019 rainy period. Despite the accuracy of present measurement techniques, their time-consuming nature renders them economically unviable. Therefore, this study aims to assess the distribution of probable geogenic contamination of groundwater and develop a robust prediction model by analyzing the relationship between predictive variables and target contaminants. In this pursuit, various machine learning techniques and regression models, including Linear Regression, Random Forest, Decision Tree, K-Neighbors, and XGBoost, as well as deep learning models such as ANN, DNN, CNN, and LSTM, were employed. Elements such as aluminum (Al), boron (B), cadmium (Cd), cobalt (Co), chromium (Cr), copper (Cu), iron (Fe), manganese (Mn), nickel (Ni), phosphorus (Pb), lead (Pb), and zinc (Zn) were utilized as features to predict fluoride levels. The SelectKbest feature selection method was used to improve the accuracy of the prediction model. This method identifies important features in the dataset for different values of k and increases model efficiency. The models were able to produce more accurate predictions by selecting the most important variables. The findings highlight the superior performance of the XGBoost regressor and CNN in predicting groundwater quality, with XGBoost consistently outperforming other models, exhibiting the lowest values for evaluation metrics like mean squared error (MSE), mean absolute error (MAE), and root mean squared error (RMSE) across different k values. For instance, when considering all features, XGBoost attained an MSE of 0.07, an MAE of 0.22, an RMSE of 0.27, a MAPE of 9.25%, and an NSE of 0.75. Conversely, the Decision Tree regressor consistently displayed inferior performance, with its maximum MSE reaching 0.11 (k = 5) and maximum RMSE of 0.33 (k = 5). Furthermore, feature selection analysis revealed the consistent significance of boron (B) and cadmium (Cd) across all datasets, underscoring their pivotal roles in groundwater contamination. Notably, in the machine learning framework evaluation, the XGBoost regressor excelled in modeling both the \"all\" and \"rainy season\" datasets, while the convolutional neural network (CNN) outperformed in the \"dry season\" dataset. This study emphasizes the potential of XGBoost regressor and CNN for accurate groundwater quality prediction and recommends their utilization, while acknowledging the limitations of the Decision Tree Regressor.
摘要:
Bitlis省及其周边地区的Türkiye东部安纳托利亚地区的地下水资源是饮用水的关键,然而他们面临着氟化物污染的重大威胁,再加上该地区的火山岩结构。为了解决这一问题,在2019年6月的干旱期和2019年9月的雨期,对30点的氟化物水平进行了精心测量。尽管目前的测量技术是准确的,它们耗时的性质使它们在经济上不可行。因此,本研究旨在评估地下水可能的地质污染分布,并通过分析预测变量与目标污染物之间的关系,建立一个稳健的预测模型。在这种追求中,各种机器学习技术和回归模型,包括线性回归,随机森林,决策树,K邻居,和XGBoost,以及神经网络等深度学习模型,DNN,CNN,LSTM,被雇用。元素如铝(Al),硼(B),镉(Cd),钴(Co),铬(Cr),铜(Cu),铁(Fe),锰(Mn),镍(Ni),磷(Pb),铅(Pb),和锌(Zn)被用作预测氟化物水平的特征。采用SelectKbest特征选择方法提高了预测模型的精度。该方法针对k的不同值识别数据集中的重要特征并提高模型效率。这些模型能够通过选择最重要的变量来产生更准确的预测。这些发现强调了XGBoost回归器和CNN在预测地下水质量方面的卓越性能,XGBoost始终优于其他型号,表现出评估指标的最低值,如均方误差(MSE),平均绝对误差(MAE),和不同k值的均方根误差(RMSE)。例如,当考虑所有功能时,XGBoost的MSE为0.07,MAE为0.22,RMSE为0.27,MAPE为9.25%,NSE为0.75。相反,决策树回归器始终显示较差的性能,其最大MSE达到0.11(k=5),最大RMSE为0.33(k=5)。此外,特征选择分析揭示了硼(B)和镉(Cd)在所有数据集中的一致性意义,强调它们在地下水污染中的关键作用。值得注意的是,在机器学习框架评估中,XGBoost回归器擅长对“所有”和“雨季”数据集进行建模,而卷积神经网络(CNN)在“旱季”数据集中的表现优于“旱季”。这项研究强调了XGBoost回归器和CNN对准确地下水水质预测的潜力,并建议利用它们。同时承认决策树回归器的局限性。
公众号