关键词: ARIMA Classical statistics LSTM Machine learning NAR NO2 SARIMA

来  源:   DOI:10.1016/j.heliyon.2022.e12584   PDF(Pubmed)

Abstract:
Nitrogen dioxide (NO2) is the most active pollutant gas emitted in the industrial era and is highly correlated with human activities. Tracking NO2 emissions and predicting their concentrations represent important steps toward controlling pollution and setting rules to protect people\'s health indoors, such as in factories, and in outdoor environments. The concentration of NO2 was affected by the COVID-19 lockdown period and decreased because of restrictions on outdoor activities. In this study, the concentration of NO2 was predicted at 14 ground stations in the United Arab Emirates (UAE) during December 2020 based on training over a full time period of two years (2019-2020). Statistical and machine learning models, such as autoregressive integrated moving average (ARIMA), seasonal autoregressive integrated moving average (SARIMA), long short-term memory (LSTM), and nonlinear autoregressive neural network (NAR-NN), are used with both open- and closed-loop architectures. The mean absolute percentage error (MAPE) was used to evaluate the performance of the models, and the results ranged from \"very good\" (MAPE of 8.64% at the Liwa station with the closed loop) to \"acceptable\" (MAPE of 42.45% at the Khadejah School station with the open loop). The results show that the predictions based on the open loop are generally better than those based on the closed loop because they yield statistically significantly lower MAPE values. For both loop types, we selected stations exhibiting the lowest, medium, and highest MAPE values as representative cases. In addition, we demonstrated that the MAPE value is highly correlated with the relative standard deviation of NO2 concentration values.
摘要:
二氧化氮(NO2)是工业时代排放最活跃的污染气体,与人类活动高度相关。跟踪NO2排放并预测其浓度是控制污染和制定规则以保护人们在室内健康的重要步骤,比如在工厂里,在户外环境中。NO2的浓度受到COVID-19封锁期的影响,并由于户外活动的限制而下降。在这项研究中,根据为期两年(2019-2020年)的全时训练,预测了2020年12月阿拉伯联合酋长国(UAE)14个地面站的NO2浓度.统计和机器学习模型,如自回归积分移动平均(ARIMA),季节性自回归综合移动平均线(SARIMA),长短期记忆(LSTM),和非线性自回归神经网络(NAR-NN),与开环和闭环架构一起使用。平均绝对百分比误差(MAPE)用于评估模型的性能,结果范围从“非常好”(具有闭环的Liwa站的MAPE为8.64%)到“可接受”(具有开环的KhadejahSchool站的MAPE为42.45%)。结果表明,基于开环的预测通常比基于闭环的预测更好,因为它们产生了统计学上显着较低的MAPE值。对于这两种循环类型,我们选择了表现最低的车站,中等,和最高MAPE值作为代表性案例。此外,我们证明了MAPE值与NO2浓度值的相对标准偏差高度相关。
公众号