%0 Journal Article %T Clinically explainable machine learning models for early identification of patients at risk of hospital-acquired urinary tract infection. %A Jakobsen RS %A Nielsen TD %A Leutscher P %A Koch K %J J Hosp Infect %V 0 %N 0 %D Mar 2023 31 %M 37004787 %F 8.944 %R 10.1016/j.jhin.2023.03.017 %X BACKGROUND: Machine learning (ML) models for early identification of patients at risk of hospital-acquired urinary tract infection (HA-UTI) may enable timely and targeted preventive and therapeutic strategies. However, clinicians are often challenged in the interpretation of the predictive outcomes provided by the ML models, which often reach different performances.
OBJECTIVE: To train ML models for predicting patients at risk of HA-UTI using available data from electronic health records at the time of hospital admission. This study focused on the performance of different ML models and clinical explainability.
METHODS: This retrospective study investigated patient data representing 138,560 hospital admissions in the North Denmark Region from 1st January 2017 to 31st December 2018. Fifty-one health sociodemographic and clinical features were extracted as the full dataset, and χ2 test and expert knowledge were used for feature selection, resulting in two reduced datasets. Seven different ML models were trained and compared between the three datasets. The SHapley Additive exPlanation (SHAP) method was used to support population- and patient-level explainability.
RESULTS: The best-performing ML model was the neural network model based on the full dataset, with an area under the curve (AUC) of 0.758. The neural network model was also the best-performing ML model based on the reduced datasets, with an AUC of 0.746. Clinical explainability was demonstrated with a SHAP summary and forceplot.
CONCLUSIONS: Within 24 h of hospital admission, the ML models were able to identify patients at risk of developing HA-UTI, providing new opportunities to develop efficient strategies for the prevention of HA-UTI. SHAP was used to demonstrate how risk predictions can be explained at individual patient level and for the patient population in general.