背景:迄今为止,尚无针对社区获得性肺炎(CAP)和结缔组织病(CTD)患者的重症监护病房(ICU)入院的个性化预测模型。在这项研究中,我们旨在建立一个基于机器学习的模型来预测这些患者是否需要入住ICU.
方法:这是一项对2008年11月至2021年11月间入住中国某大学医院的患者的回顾性研究。如果患者在入院和住院期间被诊断为CAP和CTD,则将其包括在内。与人口统计相关的数据,CTD类型,合并症,收集住院前24小时的生命体征和实验室检查结果。通过三种方法筛选基线变量以识别潜在的预测因子,包括单变量分析,最小绝对收缩和选择算子(Lasso)回归和Boruta算法。使用9种监督机器学习算法来构建预测模型。我们评估了差异化的表现,校准,和所有模型的临床实用性来确定最优模型。进行了Shapley加法解释(SHAP)和局部可解释模型不可知解释(LIME)技术来解释最佳模型。
结果:将纳入的患者以70:30的比例随机分为训练组(1070名患者)和测试组(459名患者)。三种特征选择方法的交叉结果产生了16个预测因子。极限梯度增强(XGBoost)模型在各种模型中实现了接收器工作特性曲线(AUC)下的最高面积(0.941)和精度(0.913)。校准曲线和决策曲线分析(DCA)均表明XGBoost模型优于其他模型。SHAP摘要图说明了最重要的前6个特征,包括较高的N末端B型利钠肽原(NT-proBNP)和C反应蛋白(CRP),较低水平的CD4+T细胞,淋巴细胞和血清钠,血清(1,3)-β-D-葡聚糖试验(G试验)阳性。
结论:我们成功开发,评估并解释了基于机器学习的CAP和CTD患者ICU入院预测模型。经外部验证和改进后,XGBoost模型可用于临床参考。
BACKGROUND: There is no individualized prediction model for intensive care unit (ICU) admission on patients with community-acquired pneumonia (CAP) and connective tissue disease (CTD) so far. In this study, we aimed to establish a machine learning-based model for predicting the need for ICU admission among those patients.
METHODS: This was a retrospective study on patients admitted into a University Hospital in China between November 2008 and November 2021. Patients were included if they were diagnosed with CAP and CTD during admission and hospitalization. Data related to demographics, CTD types, comorbidities, vital signs and laboratory results during the first 24 h of hospitalization were collected. The baseline variables were screened to identify potential predictors via three methods, including univariate analysis, least absolute shrinkage and selection operator (Lasso) regression and Boruta algorithm. Nine supervised machine learning algorithms were used to build prediction models. We evaluated the performances of differentiation, calibration, and clinical utility of all models to determine the optimal model. The Shapley Additive Explanations (SHAP) and Local Interpretable Model-Agnostic Explanations (LIME) techniques were performed to interpret the optimal model.
RESULTS: The included patients were randomly divided into the training set (1070 patients) and the testing set (459 patients) at a ratio of 70:30. The intersection results of three feature selection approaches yielded 16 predictors. The eXtreme gradient boosting (XGBoost) model achieved the highest area under the receiver operating characteristic curve (AUC) (0.941) and accuracy (0.913) among various models. The calibration curve and decision curve analysis (DCA) both suggested that the XGBoost model outperformed other models. The SHAP summary plots illustrated the top 6 features with the greatest importance, including higher N-terminal pro-B-type natriuretic peptide (NT-proBNP) and C-reactive protein (CRP), lower level of CD4 + T cell, lymphocyte and serum sodium, and positive serum (1,3)-β-D-glucan test (G test).
CONCLUSIONS: We successfully developed, evaluated and explained a machine learning-based model for predicting ICU admission in patients with CAP and CTD. The XGBoost model could be clinical referenced after external validation and improvement.