关键词: Boruta feature selection Convolutional neural network Electrical conductivity Long short-term memory Time series forecasting

来  源:   DOI:10.1038/s41598-024-65837-0   PDF(Pubmed)

Abstract:
Electrical conductivity (EC) is widely recognized as one of the most essential water quality metrics for predicting salinity and mineralization. In the current research, the EC of two Australian rivers (Albert River and Barratta Creek) was forecasted for up to 10 days using a novel deep learning algorithm (Convolutional Neural Network combined with Long Short-Term Memory Model, CNN-LSTM). The Boruta-XGBoost feature selection method was used to determine the significant inputs (time series lagged data) to the model. To compare the performance of Boruta-XGB-CNN-LSTM models, three machine learning approaches-multi-layer perceptron neural network (MLP), K-nearest neighbour (KNN), and extreme gradient boosting (XGBoost) were used. Different statistical metrics, such as correlation coefficient (R), root mean square error (RMSE), and mean absolute percentage error, were used to assess the models\' performance. From 10 years of data in both rivers, 7 years (2012-2018) were used as a training set, and 3 years (2019-2021) were used for testing the models. Application of the Boruta-XGB-CNN-LSTM model in forecasting one day ahead of EC showed that in both stations, Boruta-XGB-CNN-LSTM can forecast the EC parameter better than other machine learning models for the test dataset (R = 0.9429, RMSE = 45.6896, MAPE = 5.9749 for Albert River, and R = 0.9215, RMSE = 43.8315, MAPE = 7.6029 for Barratta Creek). Considering the better performance of the Boruta-XGB-CNN-LSTM model in both rivers, this model was used to forecast 3-10 days ahead of EC. The results showed that the Boruta-XGB-CNN-LSTM model is very capable of forecasting the EC for the next 10 days. The results showed that by increasing the forecasting horizon from 3 to 10 days, the performance of the Boruta-XGB-CNN-LSTM model slightly decreased. The results of this study show that the Boruta-XGB-CNN-LSTM model can be used as a good soft computing method for accurately predicting how the EC will change in rivers.
摘要:
电导率(EC)被广泛认为是预测盐度和矿化的最重要的水质指标之一。在目前的研究中,使用一种新颖的深度学习算法(卷积神经网络结合长短期记忆模型,CNN-LSTM)。Boruta-XGBoost特征选择方法用于确定模型的重要输入(时间序列滞后数据)。为了比较Boruta-XGB-CNN-LSTM模型的性能,三种机器学习方法-多层感知器神经网络(MLP),K-最近邻(KNN),和极端梯度增强(XGBoost)被使用。不同的统计指标,如相关系数(R),均方根误差(RMSE),和平均绝对百分比误差,用于评估模型的性能。从两条河流10年的数据来看,7年(2012-2018年)被用作训练集,3年(2019-2021年)用于测试模型。Boruta-XGB-CNN-LSTM模型在EC提前一天预测中的应用表明,在两个站中,Boruta-XGB-CNN-LSTM可以比其他机器学习模型更好地预测测试数据集的EC参数(对于AlbertRiver,R=0.9429,RMSE=45.6896,MAPE=5.9749,R=0.9215,RMSE=43.8315,BarrattaCreek的MAPE=7.6029)。考虑到Boruta-XGB-CNN-LSTM模型在两条河流中的性能更好,该模型用于预测EC前3-10天.结果表明,Boruta-XGB-CNN-LSTM模型非常能够预测未来10天的EC。结果表明,通过将预报时间从3天增加到10天,Boruta-XGB-CNN-LSTM模型的性能略有下降。这项研究的结果表明,Boruta-XGB-CNN-LSTM模型可以作为一种很好的软计算方法,用于准确预测河流中EC将如何变化。
公众号