关键词: Clinical explainability Feature selection Hospital-acquired urinary tract infection Machine learning Risk prediction

来  源:   DOI:10.1016/j.jhin.2023.03.017

Abstract:
BACKGROUND: Machine learning (ML) models for early identification of patients at risk of hospital-acquired urinary tract infection (HA-UTI) may enable timely and targeted preventive and therapeutic strategies. However, clinicians are often challenged in the interpretation of the predictive outcomes provided by the ML models, which often reach different performances.
OBJECTIVE: To train ML models for predicting patients at risk of HA-UTI using available data from electronic health records at the time of hospital admission. This study focused on the performance of different ML models and clinical explainability.
METHODS: This retrospective study investigated patient data representing 138,560 hospital admissions in the North Denmark Region from 1st January 2017 to 31st December 2018. Fifty-one health sociodemographic and clinical features were extracted as the full dataset, and χ2 test and expert knowledge were used for feature selection, resulting in two reduced datasets. Seven different ML models were trained and compared between the three datasets. The SHapley Additive exPlanation (SHAP) method was used to support population- and patient-level explainability.
RESULTS: The best-performing ML model was the neural network model based on the full dataset, with an area under the curve (AUC) of 0.758. The neural network model was also the best-performing ML model based on the reduced datasets, with an AUC of 0.746. Clinical explainability was demonstrated with a SHAP summary and forceplot.
CONCLUSIONS: Within 24 h of hospital admission, the ML models were able to identify patients at risk of developing HA-UTI, providing new opportunities to develop efficient strategies for the prevention of HA-UTI. SHAP was used to demonstrate how risk predictions can be explained at individual patient level and for the patient population in general.
摘要:
背景:用于早期识别有医院获得性尿路感染(HA-UTI)风险的患者的机器学习(ML)模型可以实现及时和有针对性的预防和治疗策略。然而,临床医生在解释ML模型提供的预测结果方面经常面临挑战,通常会达到不同的表现。
目的:使用入院时电子健康记录中的可用数据来训练ML模型,以预测有HA-UTI风险的患者。本研究集中于不同ML模型的表现和临床可解释性。
方法:这项回顾性研究调查了2017年1月1日至2018年12月31日北丹麦地区138,560例住院患者的数据。51个健康社会人口统计学和临床特征被提取为完整的数据集,采用χ2检验和专家知识进行特征选择,导致两个减少的数据集。对七个不同的ML模型进行了训练,并在三个数据集之间进行了比较。使用Shapley加法扩张(SHAP)方法来支持人群和患者水平的可解释性。
结果:性能最好的ML模型是基于完整数据集的神经网络模型,曲线下面积(AUC)为0.758。神经网络模型也是基于简化数据集的性能最好的ML模型,AUC为0.746。SHAP摘要和力图证明了临床可解释性。
结论:入院后24小时内,ML模型能够识别有发展HA-UTI风险的患者,为制定预防HA-UTI的有效策略提供了新的机会。SHAP用于证明如何在个体患者水平和一般患者人群中解释风险预测。
公众号