关键词: Gradient boosting Jing-Jin-Ji city group K-Nearest Neighbor Lasso regression Linear SVR PM2.5 prediction Gradient boosting Jing-Jin-Ji city group K-Nearest Neighbor Lasso regression Linear SVR PM2.5 prediction

来  源:   DOI:10.1016/j.heliyon.2022.e10691   PDF(Pubmed)

Abstract:
Globally all countries encounter air pollution problems along their development path. As a significant indicator of air quality, PM2.5 concentration has long been proven to be affecting the population\'s death rate. Machine learning algorithms proven to outperform traditional statistical approaches are widely used in air pollution prediction. However research on the model selection discussion and environmental interpretation of model prediction results is still scarce and urgently needed to lead the policy making on air pollution control. Our research compared four types of machine learning algorisms LinearSVR, K-Nearest Neighbor, Lasso regression, Gradient boosting by looking into their performance in predicting PM2.5 concentrations among different cities and seasons. The results show that the machine learning model is able to forecast the next day PM2.5 concentration based on the previous five days\' data with better accuracy. The comparative experiments show that based on city level the Gradient Boosting prediction model has better prediction performance with mean absolute error (MAE) of 9 ug/m3 and root mean square error (RMSE) of 10.25-16.76 ug/m3, lower compared with the other three models, and based on season level four models have the best prediction performances in winter time and the worst in summer time. And more importantly the demonstration of models\' different performances in each city and each season is of great significance in environmental policy implications.
摘要:
在全球范围内,所有国家在其发展道路上都会遇到空气污染问题。作为空气质量的重要指标,PM2.5浓度早已被证明会影响人群的死亡率。被证明优于传统统计方法的机器学习算法被广泛用于空气污染预测。然而,关于模型预测结果的模型选择讨论和环境解释的研究仍然很少,迫切需要领导空气污染控制政策的制定。我们的研究比较了四种类型的机器学习算法LinearSVR,K-最近的邻居,套索回归,通过研究它们在预测不同城市和季节的PM2.5浓度方面的表现来提高梯度。结果表明,机器学习模型能够根据前五天的数据预测第二天的PM2.5浓度,具有较好的准确性。对比实验表明,基于城市水平的梯度提升预测模型具有更好的预测性能,平均绝对误差(MAE)为9ug/m3,均方根误差(RMSE)为10.25-16.76ug/m3,与其他三种模型相比,并且基于季节级别的四个模型在冬季具有最佳的预测性能,在夏季具有最差的预测性能。更重要的是,在每个城市和每个季节展示模型的不同表现对环境政策影响具有重要意义。
公众号