背景:2019年冠状病毒病(COVID-19),全球公共卫生危机,尽管采取了预防措施,但仍继续构成挑战。新冠肺炎病例的每日上升令人担忧,测试过程既耗时又昂贵。虽然已经建立了几个模型来预测COVID-19患者的死亡率,只有少数人表现出足够的准确性。机器学习算法为数据驱动的临床结果预测提供了一种有前途的方法,超越传统的统计建模。利用机器学习(ML)算法可能为预测埃塞俄比亚住院COVID-19患者的死亡率提供解决方案。因此,本研究的目的是开发和验证机器学习模型,以准确预测埃塞俄比亚COVID-19住院患者的死亡率.
方法:我们的研究包括分析埃塞俄比亚公立医院收治的COVID-19患者的电子病历。具体来说,我们开发了7种不同的机器学习模型来预测COVID-19患者的死亡率.这些模型包括J48决策树,随机森林(RF),k-最近邻域(k-NN),多层感知器(MLP),朴素贝叶斯(NB),极限梯度提升(XGBoost),和逻辑回归(LR)。然后,我们使用来自696名患者队列的数据通过统计分析比较了这些模型的性能。为了评估模型的有效性,我们利用了从混淆矩阵导出的度量,如灵敏度,特异性,精度,和接收机工作特性(ROC)。
结果:本研究共纳入696名患者,女性人数较多(440名患者,占63.2%)与男性相比。参与者的平均年龄为35.0岁,四分位数间距为18-79.进行不同的特征选择程序后,检查了23个特征,并被确定为死亡率的预测因子,确定了性别,重症监护病房(ICU)入院,饮酒/成瘾是COVID-19死亡率的三大预测因素。另一方面,失去气味,失去味道,高血压被确定为COVID-19死亡率的三个最低预测因子。实验结果表明,k-近邻(k-NN)算法的性能优于其他机器学习算法,达到95.25%的准确度,灵敏度为95.30%,精度为92.7%,特异性为93.30%,F1得分为93.98%,接受者工作特征(ROC)得分为96.90%。这些发现突出了k-NN算法在根据选定特征预测COVID-19结果方面的有效性。
结论:我们的研究开发了一种创新模型,该模型利用医院数据准确预测COVID-19患者的死亡风险。该模型的主要目标是优先考虑高危患者的早期治疗,并在大流行期间优化紧张的医疗保健系统。通过将机器学习与全面的医院数据库集成,我们的模型有效地对患者的死亡风险进行了分类,实现有针对性的医疗干预和改进的资源管理。在测试的各种方法中,K最近邻(KNN)算法表现出最高的精度,允许早期识别高危患者。通过KNN特征识别,我们确定了23个显著有助于预测COVID-19死亡率的预测因子.前五名预测因素是性别(女性),重症监护病房(ICU)入院,饮酒,吸烟,还有头痛和寒战的症状.这一进展在大流行期间加强医疗保健成果和决策方面具有巨大的前景。通过提供服务并根据确定的预测因素对患者进行优先级排序,医疗保健设施和提供者可以提高个人的生存机会。该模型提供了宝贵的见解,可以指导医疗保健专业人员分配资源并为风险最高的人提供适当的护理。
BACKGROUND: Coronavirus disease 2019 (COVID-19), a global public health crisis, continues to pose challenges despite preventive measures. The daily rise in COVID-19 cases is concerning, and the testing process is both time-consuming and costly. While several models have been created to predict mortality in COVID-19 patients, only a few have shown sufficient accuracy. Machine learning algorithms offer a promising approach to data-driven prediction of clinical outcomes, surpassing traditional statistical modeling. Leveraging machine learning (ML) algorithms could potentially provide a solution for predicting mortality in hospitalized COVID-19 patients in Ethiopia. Therefore, the aim of this study is to develop and validate machine-learning models for accurately predicting mortality in COVID-19 hospitalized patients in Ethiopia.
METHODS: Our study involved analyzing electronic medical records of COVID-19 patients who were admitted to public hospitals in Ethiopia. Specifically, we developed seven different machine learning models to predict COVID-19 patient mortality. These models included J48 decision tree, random forest (RF), k-nearest neighborhood (k-NN), multi-layer perceptron (MLP), Naïve Bayes (NB), eXtreme gradient boosting (XGBoost), and logistic regression (LR). We then compared the performance of these models using data from a cohort of 696 patients through statistical analysis. To evaluate the effectiveness of the models, we utilized metrics derived from the confusion matrix such as sensitivity, specificity, precision, and receiver operating characteristic (ROC).
RESULTS: The study included a total of 696 patients, with a higher number of females (440 patients, accounting for 63.2%) compared to males. The median age of the participants was 35.0 years old, with an interquartile range of 18-79. After conducting different feature selection procedures, 23 features were examined, and identified as predictors of mortality, and it was determined that gender, Intensive care unit (ICU) admission, and alcohol drinking/addiction were the top three predictors of COVID-19 mortality. On the other hand, loss of smell, loss of taste, and hypertension were identified as the three lowest predictors of COVID-19 mortality. The experimental results revealed that the k-nearest neighbor (k-NN) algorithm outperformed than other machine learning algorithms, achieving an accuracy of 95.25%, sensitivity of 95.30%, precision of 92.7%, specificity of 93.30%, F1 score 93.98% and a receiver operating characteristic (ROC) score of 96.90%. These findings highlight the effectiveness of the k-NN algorithm in predicting COVID-19 outcomes based on the selected features.
CONCLUSIONS: Our study has developed an innovative model that utilizes hospital data to accurately predict the mortality risk of COVID-19 patients. The main objective of this model is to prioritize early treatment for high-risk patients and optimize strained healthcare systems during the ongoing pandemic. By integrating machine learning with comprehensive hospital databases, our model effectively classifies patients\' mortality risk, enabling targeted medical interventions and improved resource management. Among the various methods tested, the K-nearest neighbors (KNN) algorithm demonstrated the highest accuracy, allowing for early identification of high-risk patients. Through KNN feature identification, we identified 23 predictors that significantly contribute to predicting COVID-19 mortality. The top five predictors are gender (female), intensive care unit (ICU) admission, alcohol drinking, smoking, and symptoms of headache and chills. This advancement holds great promise in enhancing healthcare outcomes and decision-making during the pandemic. By providing services and prioritizing patients based on the identified predictors, healthcare facilities and providers can improve the chances of survival for individuals. This model provides valuable insights that can guide healthcare professionals in allocating resources and delivering appropriate care to those at highest risk.