关键词: data imbalance machine learning major adverse cardiovascular events oversampling risk prediction

来  源:   DOI:10.2196/33395

Abstract:
BACKGROUND: As a major health hazard, the incidence of coronary heart disease has been increasing year by year. Although coronary revascularization, mainly percutaneous coronary intervention, has played an important role in the treatment of coronary heart disease, major adverse cardiovascular events (MACE) such as recurrent or persistent angina pectoris after coronary revascularization remain a very difficult problem in clinical practice.
OBJECTIVE: Given the high probability of MACE after coronary revascularization, the aim of this study was to develop and validate a predictive model for MACE occurrence within 6 months based on machine learning algorithms.
METHODS: A retrospective study was performed including 1004 patients who had undergone coronary revascularization at The People\'s Hospital of Liaoning Province and Affiliated Hospital of Liaoning University of Traditional Chinese Medicine from June 2019 to December 2020. According to the characteristics of available data, an oversampling strategy was adopted for initial preprocessing. We then employed six machine learning algorithms, including decision tree, random forest, logistic regression, naïve Bayes, support vector machine, and extreme gradient boosting (XGBoost), to develop prediction models for MACE depending on clinical information and 6-month follow-up information. Among all samples, 70% were randomly selected for training and the remaining 30% were used for model validation. Model performance was assessed based on accuracy, precision, recall, F1-score, confusion matrix, area under the receiver operating characteristic (ROC) curve (AUC), and visualization of the ROC curve.
RESULTS: Univariate analysis showed that 21 patient characteristic variables were statistically significant (P<.05) between the groups without and with MACE. Coupled with these significant factors, among the six machine learning algorithms, XGBoost stood out with an accuracy of 0.7788, precision of 0.8058, recall of 0.7345, F1-score of 0.7685, and AUC of 0.8599. Further exploration of the models to identify factors affecting the occurrence of MACE revealed that use of anticoagulant drugs and course of the disease consistently ranked in the top two predictive factors in three developed models.
CONCLUSIONS: The machine learning risk models constructed in this study can achieve acceptable performance of MACE prediction, with XGBoost performing the best, providing a valuable reference for pointed intervention and clinical decision-making in MACE prevention.
摘要:
背景:作为主要的健康危害,冠心病的发病率逐年上升。虽然冠状动脉血运重建,主要是经皮冠状动脉介入治疗,在冠心病的治疗中发挥了重要作用,冠状动脉血运重建后的复发或持续性心绞痛等主要不良心血管事件(MACE)在临床实践中仍然是一个非常困难的问题.
目的:鉴于冠状动脉血运重建后发生MACE的概率较高,本研究的目的是开发并验证基于机器学习算法的6个月内MACE发生的预测模型.
方法:回顾性研究纳入2019年6月至2020年12月在辽宁省人民医院和辽宁中医药大学附属医院行冠状动脉血运重建的1004例患者。根据现有数据的特点,初始预处理采用过采样策略。然后我们使用了六种机器学习算法,包括决策树,随机森林,逻辑回归,天真贝叶斯,支持向量机,和极端梯度提升(XGBoost),根据临床信息和6个月随访信息开发MACE预测模型。在所有样本中,随机选择70%进行训练,其余30%用于模型验证。模型性能是根据准确性进行评估的,精度,召回,F1分数,混淆矩阵,接收器工作特征(ROC)曲线(AUC)下面积,和可视化的ROC曲线。
结果:单变量分析显示,无MACE和有MACE的组之间有21个患者特征变量有统计学意义(P<0.05)。加上这些重要因素,在六种机器学习算法中,XGBoost的准确度为0.7788,精确度为0.8058,召回率为0.7345,F1评分为0.7685,AUC为0.8599。对模型的进一步探索以确定影响MACE发生的因素表明,在三个开发的模型中,抗凝药物的使用和疾病的病程始终排在前两个预测因素中。
结论:本研究中构建的机器学习风险模型可以实现可接受的MACE预测性能,与XGBoost表现最好的,为MACE预防的针对性干预和临床决策提供有价值的参考。
公众号