关键词: big data electronic health records gout hyperuricemia machine learning

来  源:   DOI:10.1093/rheumatology/keae273

Abstract:
OBJECTIVE: To develop a machine learning-based prediction model for identifying hyperuricemic participants at risk of developing gout.
METHODS: A retrospective nationwide Israeli cohort study used the Clalit Health Insurance database of 473 124 individuals to identify adults 18 years or older with at least two serum urate measurements exceeding 6.8 mg/dl between January 2007 and December 2022. Patients with a prior gout diagnosis or on gout medications were excluded. Patients\' demographic characteristics, community and hospital diagnoses, routine medication prescriptions and laboratory results were used to train a risk prediction model. A machine learning model, XGBoost, was developed to predict the risk of gout. Feature selection methods were used to identify relevant variables. The model\'s performance was evaluated using the receiver operating characteristic area under the curve (ROC AUC) and precision-recall AUC. The primary outcome was the diagnosis of gout among hyperuricemic patients.
RESULTS: Among the 301 385 participants with hyperuricemia included in the analysis, 15 055 (5%) were diagnosed with gout. The XGBoost model had a ROC-AUC of 0.781 (95% CI 0.78-0.784) and precision-recall AUC of 0.208 (95% CI 0.195-0.22). The most significant variables associated with gout diagnosis were serum uric acid levels, age, hyperlipidemia, non-steroidal anti-inflammatory drugs and diuretic purchases. A compact model using only these five variables yielded a ROC-AUC of 0.714 (95% CI 0.706-0.723) and a negative predictive value (NPV) of 95%.
CONCLUSIONS: The findings of this cohort study suggest that a machine learning-based prediction model had relatively good performance and high NPV for identifying hyperuricemic participants at risk of developing gout.
摘要:
目的:开发一种基于机器学习的预测模型,用于识别有患痛风风险的高尿酸血症参与者。
方法:一项以色列全国范围的回顾性队列研究使用了Clalit健康保险数据库,该数据库包含473124人,以确定在2007年1月至2022年12月期间至少有两次血清尿酸盐测量值超过6.8mg/dl的18岁或以上的成年人。既往有痛风诊断或使用痛风药物的患者被排除在外。患者的人口统计学特征,社区和医院诊断,使用常规药物处方和实验室结果来训练风险预测模型.机器学习模型,XGBoost,被开发来预测痛风的风险。使用特征选择方法来识别相关变量。使用接收器工作特征曲线下面积(ROCAUC)和精确召回AUC评估模型的性能。主要结果是高尿酸血症患者中痛风的诊断。
结果:在分析中包括的301385名高尿酸血症参与者中,15055(5%)被诊断为痛风。XGBoost模型的ROC-AUC为0.781(95%CI0.78-0.784),精确召回AUC为0.208(95%CI0.195-0.22)。与痛风诊断相关的最重要变量是血清尿酸水平,年龄,高脂血症,非甾体抗炎药和利尿剂购买。仅使用这五个变量的紧凑模型得出的ROC-AUC为0.714(95%CI0.706-0.723),阴性预测值(NPV)为95%。
结论:这项队列研究的结果表明,基于机器学习的预测模型在识别有痛风风险的高尿酸血症参与者方面具有相对良好的性能和较高的NPV。
公众号