基于机器学习的 TBI 24 小时生存结果预测分析。Prediction analysis of TBI 24-h survival outcome based on machine learning.-医云文献数字医云科研云海量医学决策数据服务

Abstract：

UNASSIGNED: Traumatic brain injury (TBI) is the major reason for the death of young people and is well known for its high mortality and morbidity. This paper aim to predict the 24h survival of patients with TBI.
UNASSIGNED: A total of 1224 samples were involved in this analysis, and the clinical indicators involved included age, gender, blood pressure, MGAP and other fields, among which the target variable was \"outcome\", which was a binary variable. The methods mainly involved in this paper include data visualization analysis, single factor analysis, feature engineering analysis, random forest model (RF), K-Nearst Neighbors (KNN) model, and so on. Logistic regression model (LR) and deep neural network model (DNN). We will oversample the training set using the SMOTE method because of the very unbalanced labeling of the sample itself.
UNASSIGNED: Although the accuracy of all models is very high, the recall rate is relatively low. The DNN model with the best performance only reaches 0.17, and the corresponding AUC is 0.80. After resampling, we find that the recall rate of positive samples of all models has increased a lot, but the AUC of some models has decreased. Finally, the optimal model is LR, whose positive sample recall rate is 0.67 and AUC is 0.82.
UNASSIGNED: Through resampling, we obtained that the best model is the RF model, whose recall rate and AUC are the best, and the AUC level is about 0.87, indicating that the accuracy performance of the model is still good.

摘要：

■创伤性脑损伤（TBI）是年轻人死亡的主要原因，并且以其高死亡率和高发病率而闻名。本文旨在预测TBI患者的24h生存率。
■本次分析共涉及1224个样品，涉及的临床指标包括年龄，性别,血压,MGAP和其他字段，其中目标变量是“结果”，这是一个二进制变量。本文主要涉及的方法包括数据可视化分析,单因素分析，特征工程分析，随机森林模型(RF)，K-近邻(KNN)模型，等等。Logistic回归模型(LR)和深度神经网络模型(DNN)。我们将使用SMOTE方法对训练集进行过采样，因为样本本身的标记非常不平衡。
■尽管所有模型的准确性都很高，召回率相对较低。性能最好的DNN模型仅达到0.17，对应的AUC为0.80。重新采样后，我们发现所有模型的阳性样本的召回率都提高了很多，但一些模型的AUC有所下降。最后,最优模型是LR，其阳性样本召回率为0.67，AUC为0.82。
■通过重采样，我们得到了最好的模型是射频模型，其召回率和AUC最好，且AUC水平约为0.87，说明模型的精度表现仍较好。