关键词: Cerebrovascular disease Environmental exposure Hospital admissions SHAP value Stacking ensemble model

Mesh : Humans Neural Networks, Computer China / epidemiology Machine Learning Hospitalization Cerebrovascular Disorders / epidemiology

来  源:   DOI:10.1186/s12911-023-02159-7

Abstract:
With the prevalence of cerebrovascular disease (CD) and the increasing strain on healthcare resources, forecasting the healthcare demands of cerebrovascular patients has significant implications for optimizing medical resources.
In this study, a stacking ensemble model comprised of four base learners (ridge regression, random forest, gradient boosting decision tree, and artificial neural network) and a meta learner (elastic net) was proposed for predicting the daily number of hospital admissions (HAs) for CD using the historical HAs data, air quality data, and meteorological data in Chengdu, China from 2015 to 2018. To solve the label imbalance problem, a re-weighting method based on label distribution smoothing was integrated into the meta learner. We trained the model using the data from 2015 to 2017 and evaluated its predictive ability using the data in 2018 based on four metrics, including mean absolute error (MAE), root mean square error (RMSE), mean absolute percentage error (MAPE), and coefficient of determination (R2). In addition, the SHapley Additive exPlanations (SHAP) framework was applied to provide explanation for the prediction of our stacking model.
Our proposed model outperformed all the base learners and long short-term memory (LSTM) on two datasets. Particularly, compared with the optimal results obtained by individual models, the MAE, RMSE, and MAPE of the stacking model decreased by 13.9%, 12.7%, and 5.8%, respectively, and the R2 improved by 6.8% on CD dataset. The model explanation demonstrated that environmental features played a role in further improving the model performance and identified that high temperature and high concentrations of gaseous air pollutants might strongly associate with an increased risk of CD.
Our stacking model considering environmental exposure is efficient in predicting daily HAs for CD and has practical value in early warning and healthcare resource allocation.
摘要:
背景:随着脑血管疾病(CD)的流行和医疗资源的日益紧张,预测脑血管患者的医疗保健需求对于优化医疗资源具有重要意义。
方法:在本研究中,由四个基础学习者组成的堆叠集成模型(岭回归,随机森林,梯度增强决策树,和人工神经网络),并提出了一种元学习器(弹性网),用于使用历史HAs数据预测CD的每日住院人数(HAs),空气质量数据,和成都的气象数据,中国从2015年到2018年。为了解决标签不平衡问题,基于标签分布平滑的重加权方法被集成到元学习器中.我们使用2015年至2017年的数据对模型进行训练,并根据四个指标使用2018年的数据评估其预测能力,包括平均绝对误差(MAE),均方根误差(RMSE),平均绝对百分比误差(MAPE),和决定系数(R2)。此外,Shapley加法扩张(SHAP)框架被用来为我们的堆叠模型的预测提供解释。
结果:我们提出的模型在两个数据集上优于所有基础学习者和长期短期记忆(LSTM)。特别是,与单个模型获得的最佳结果相比,MAE,RMSE,堆叠模型的MAPE下降了13.9%,12.7%,和5.8%,分别,在CD数据集上,R2提高了6.8%。模型解释表明,环境特征在进一步改善模型性能方面发挥了作用,并确定高温和高浓度的气态空气污染物可能与CD风险增加密切相关。
结论:我们考虑环境暴露的堆叠模型可以有效预测CD的每日HAs,并且在预警和医疗资源分配方面具有实用价值。
公众号