关键词: Heavy metals Machine learning Oversampling Sediment Source identification

来  源:   DOI:10.1016/j.scitotenv.2024.174755

Abstract:
Contaminated sediments can adversely affect aquatic ecosystems, making the identification and management of pollutant sources extremely important. In this study, we proposed machine learning approaches to reveal sources and their influential distances for heavy metal contamination of downstream sediment. We employed classification models with artificial neural networks (ANN) and random forest (RF), respectively, to predict the heavy metal contamination of stream sediments using upland environmental variables as input features. A comprehensive Korean nationwide monitoring database containing 1546 datasets was used to train and test the models. These datasets encompass the concentrations of eight heavy metals (Ar, Cd, Cr, Cu, Hg, Ni, Pb, and Zn) in sediment samples collected from 160 stream sites across the nation from 2014 to 2018. Model\'s prediction accuracy was evaluated for input feature sets from different influential upland areas defined by different buffer radii and the watershed boundary for each site. Although both ANN and RF models were unsatisfactory in predicting heavy metal quartile classes, RF-classifiers with adaptive synthetic oversampling (ORFC) showed reasonably well-predicted classes of the sediment samples based on the Canada\'s Sediment Quality Guidelines (accuracy ranged from 0.67 to 0.94). The best influential distance (i.e., buffer radius) was determined for each heavy metal based on the accuracy of ORFC. The results indicated that Cd, Cu and Pb had shorter influential distances (1.5-2.0 km) than the other heavy metals with little difference in accuracy for different influential distances. Feature importance calculation revealed that upland soil contamination was the primary factor for Hg and Ni, while residential areas and roads were significant features associated with Pb and Zn contamination. This approach offers information on major contamination sources and their influential areas to be prioritized for managing contaminated stream sediments.
摘要:
受污染的沉积物会对水生生态系统产生不利影响,使污染源的识别和管理变得极其重要。在这项研究中,我们提出了机器学习方法来揭示下游沉积物重金属污染的来源及其影响距离。我们采用人工神经网络(ANN)和随机森林(RF)的分类模型,分别,以高地环境变量为输入特征,预测河流沉积物的重金属污染。使用包含1546个数据集的韩国全国综合监测数据库来训练和测试模型。这些数据集包含八种重金属的浓度(Ar,Cd,Cr,Cu,Hg,Ni,Pb,和Zn)在2014年至2018年从全国160个河流站点收集的沉积物样本中。对来自不同影响的高地区域的输入特征集的模型预测精度进行了评估,这些特征集由不同的缓冲区半径和每个站点的分水岭边界定义。尽管人工神经网络和射频模型在预测重金属四分位数类别方面都不尽人意,具有自适应合成过采样(ORFC)的RF分类器显示,根据加拿大的《沉积物质量指南》,沉积物样品的预测等级合理(精度范围为0.67至0.94)。最佳影响距离(即,缓冲液半径)是根据ORFC的准确性确定每种重金属的。结果表明,Cu和Pb的影响距离比其他重金属短(1.5-2.0km),对于不同的影响距离,精度差异不大。特征重要性计算表明,旱地土壤污染是汞和镍的主要因素,而居民区和道路是与铅和锌污染相关的重要特征。这种方法提供了有关主要污染源及其影响区域的信息,以优先管理受污染的河流沉积物。
公众号