基于机器学习的川崎综合征患儿冠状动脉病变预测 [J].Prediction of coronary artery lesions in children with Kawasaki syndrome based on machine learning.-医云文献数字医云科研云海量医学决策数据服务

Abstract：

OBJECTIVE: Kawasaki syndrome (KS) is an acute vasculitis that affects children < 5 years of age and leads to coronary artery lesions (CAL) in about 20-25% of untreated cases. Machine learning (ML) is a branch of artificial intelligence (AI) that integrates complex data sets on a large scale and uses huge data to predict future events. The purpose of the present study was to use ML to present the model for early risk assessment of CAL in children with KS by different algorithms.
METHODS: A total of 158 children were enrolled from Women and Children\'s Hospital, Qingdao University, and divided into 70-30% as the training sets and the test sets for modeling and validation studies. There are several classifiers are constructed for models including the random forest (RF), the logistic regression (LR), and the eXtreme Gradient Boosting (XGBoost). Data preprocessing is analyzed before applying the classifiers to modeling. To avoid the problem of overfitting, the 5-fold cross validation method was used throughout all the data.
RESULTS: The area under the curve (AUC) of the RF model was 0.925 according to the validation of the test set. The average accuracy was 0.930 (95% CI, 0.905 to 0.956). The AUC of the LG model was 0.888 and the average accuracy was 0.893 (95% CI, 0,837 to 0.950). The AUC of the XGBoost model was 0.879 and the average accuracy was 0.935 (95% CI, 0.891 to 0.980).
CONCLUSIONS: The RF algorithm was used in the present study to construct a prediction model for CAL effectively, with an accuracy of 0.930 and AUC of 0.925. The novel model established by ML may help guide clinicians in the initial decision to make a more aggressive initial anti-inflammatory therapy. Due to the limitations of external validation and regional population characteristics, additional research is required to initiate a further application in the clinic.

摘要：

目的：川崎综合征（KS）是一种急性血管炎，影响5岁以下的儿童，在未经治疗的病例中约有20-25％会导致冠状动脉病变（CAL）。机器学习（ML）是人工智能（AI）的一个分支，它大规模集成了复杂的数据集，并使用庞大的数据来预测未来的事件。本研究的目的是使用ML提出模型，以通过不同的算法对KS儿童进行CAL的早期风险评估。
方法：妇女儿童医院共纳入158名儿童，青岛大学,并分为70-30%作为建模和验证研究的训练集和测试集。有几个分类器是为模型构建的，包括随机森林(RF)，逻辑回归(LR)，和极限梯度提升（XGBoost）。在将分类器应用于建模之前分析数据预处理。为了避免过度拟合的问题，所有数据均使用5倍交叉验证方法.
结果：根据测试集的验证，RF模型的曲线下面积（AUC）为0.925。平均准确度为0.930(95%CI,0.905~0.956)。LG模型的AUC为0.888，平均准确度为0.893（95％CI，0,837至0.950）。XGBoost模型的AUC为0.879，平均准确度为0.935（95％CI，0.891至0.980）。
结论：在本研究中使用RF算法来有效地构建CAL的预测模型，精度为0.930，AUC为0.925。ML建立的新模型可能有助于指导临床医生做出更积极的初始抗炎治疗。由于外部验证和区域人口特征的限制，需要更多的研究来启动进一步的临床应用。