背景:感染性腹泻仍然是世界范围内的主要公共卫生问题。本研究使用堆叠集合建立了感染性腹泻发病率的预测模型,旨在实现更好的预测性能。
方法:根据感染性腹泻病例的监测数据,2016-2021年广州市的相关症状和气象因素,我们使用人工神经网络(ANN)开发了四个基础预测模型,长短期记忆网络(LSTM)支持向量回归(SVR)和极端梯度提升回归树(XGBoost),然后使用堆叠进行整合以获得最终的预测模型。所有模型都用三个指标进行了评估:平均绝对百分比误差(MAPE),均方根误差(RMSE),和平均绝对误差(MAE)。
结果:纳入症状监测数据和每周感染性腹泻病例数的基础模型能够实现较低的RMSE,MAEs,和MAPE比增加气象数据和每周感染性腹泻病例数的模型。LSTM在四个基础模型中具有最佳的预测性能,和它的RMSE,MAE,和MAPE分别为:84.85、57.50和15.92%,分别。堆叠组合模型的性能优于四个基础模型,谁的RMSE,MAE,MAPE分别为75.82、55.93和15.70%,分别。
结论:纳入症状监测数据可以提高感染性腹泻预测模型的预测准确性,症状监测数据在增强模型性能方面比气象数据更有效。采用堆叠式组合多种预测模型能够缓解选择最优模型的困难,并且可以获得比基础模型性能更好的模型。
BACKGROUND: Infectious diarrhea remains a major public health problem worldwide. This study used
stacking ensemble to developed a predictive model for the incidence of infectious diarrhea, aiming to achieve better prediction performance.
METHODS: Based on the surveillance data of infectious diarrhea cases, relevant symptoms and meteorological factors of Guangzhou from 2016 to 2021, we developed four base prediction models using artificial neural networks (ANN), Long Short-Term Memory networks (LSTM), support vector regression (SVR) and extreme gradient boosting regression trees (XGBoost), which were then ensembled using
stacking to obtain the final prediction model. All the models were evaluated with three metrics: mean absolute percentage error (MAPE), root mean square error (RMSE), and mean absolute error (MAE).
RESULTS: Base models that incorporated symptom surveillance data and weekly number of infectious diarrhea cases were able to achieve lower RMSEs, MAEs, and MAPEs than models that added meteorological data and weekly number of infectious diarrhea cases. The LSTM had the best prediction performance among the four base models, and its RMSE, MAE, and MAPE were: 84.85, 57.50 and 15.92%, respectively. The
stacking ensembled model outperformed the four base models, whose RMSE, MAE, and MAPE were 75.82, 55.93, and 15.70%, respectively.
CONCLUSIONS: The incorporation of symptom surveillance data could improve the predictive accuracy of infectious diarrhea prediction models, and symptom surveillance data was more effective than meteorological data in enhancing model performance. Using
stacking to combine multiple prediction models were able to alleviate the difficulty in selecting the optimal model, and could obtain a model with better performance than base models.