关键词: HAI SHAP Shapley Additive Explanations ensemble learning extreme boosted gradient boosted classifier gradient-boosted classifier health care–associated infection machine learning methicillin-resistant Staphylococcus aureus network penalized logistic regression random forest classifier

来  源:   DOI:10.2196/48067   PDF(Pubmed)

Abstract:
BACKGROUND: Health care-associated infections due to multidrug-resistant organisms (MDROs), such as methicillin-resistant Staphylococcus aureus (MRSA) and Clostridioides difficile (CDI), place a significant burden on our health care infrastructure.
OBJECTIVE: Screening for MDROs is an important mechanism for preventing spread but is resource intensive. The objective of this study was to develop automated tools that can predict colonization or infection risk using electronic health record (EHR) data, provide useful information to aid infection control, and guide empiric antibiotic coverage.
METHODS: We retrospectively developed a machine learning model to detect MRSA colonization and infection in undifferentiated patients at the time of sample collection from hospitalized patients at the University of Virginia Hospital. We used clinical and nonclinical features derived from on-admission and throughout-stay information from the patient\'s EHR data to build the model. In addition, we used a class of features derived from contact networks in EHR data; these network features can capture patients\' contacts with providers and other patients, improving model interpretability and accuracy for predicting the outcome of surveillance tests for MRSA. Finally, we explored heterogeneous models for different patient subpopulations, for example, those admitted to an intensive care unit or emergency department or those with specific testing histories, which perform better.
RESULTS: We found that the penalized logistic regression performs better than other methods, and this model\'s performance measured in terms of its receiver operating characteristics-area under the curve score improves by nearly 11% when we use polynomial (second-degree) transformation of the features. Some significant features in predicting MDRO risk include antibiotic use, surgery, use of devices, dialysis, patient\'s comorbidity conditions, and network features. Among these, network features add the most value and improve the model\'s performance by at least 15%. The penalized logistic regression model with the same transformation of features also performs better than other models for specific patient subpopulations.
CONCLUSIONS: Our study shows that MRSA risk prediction can be conducted quite effectively by machine learning methods using clinical and nonclinical features derived from EHR data. Network features are the most predictive and provide significant improvement over prior methods. Furthermore, heterogeneous prediction models for different patient subpopulations enhance the model\'s performance.
摘要:
背景:由于多重耐药生物体(MDROs)引起的医疗保健相关感染,如耐甲氧西林金黄色葡萄球菌(MRSA)和艰难梭菌(CDI),给我们的医疗基础设施带来沉重负担。
目的:MDROs的筛查是防止传播的重要机制,但却是资源密集型的。这项研究的目的是开发可以使用电子健康记录(EHR)数据预测定植或感染风险的自动化工具,提供有用的信息来帮助感染控制,并指导经验性抗生素覆盖。
方法:我们回顾性地开发了一个机器学习模型来检测在弗吉尼亚大学医院住院患者样本采集时未分化患者的MRSA定植和感染。我们使用来自患者EHR数据的入院和住院期间信息的临床和非临床特征来构建模型。此外,我们在EHR数据中使用了一类从联系网络派生的特征;这些网络特征可以捕获患者与提供者和其他患者的联系,提高预测MRSA监测试验结果的模型可解释性和准确性。最后,我们探索了不同患者亚群的异质模型,例如,入住重症监护病房或急诊科的人或有特定检测史的人,哪个表现更好。
结果:我们发现惩罚逻辑回归比其他方法表现更好,当我们使用多项式(二次)变换特征时,该模型的性能根据其接收器操作特征-曲线下面积得分提高了近11%。预测MDRO风险的一些重要特征包括抗生素使用,手术,使用设备,透析,患者的合并症状况,和网络特征。其中,网络功能增加了最大的价值,并将模型的性能提高了至少15%。对于特定患者亚群,具有相同特征转换的惩罚逻辑回归模型也比其他模型表现更好。
结论:我们的研究表明,使用来自EHR数据的临床和非临床特征,通过机器学习方法可以非常有效地进行MRSA风险预测。网络特征是最具预测性的,并且提供优于现有方法的显著改进。此外,不同患者亚群的异质预测模型提高了模型的性能。
公众号