关键词: COPD Chronic obstructive pulmonary disease Early-onset Electronic health records Genetic data Machine learning Polygenic risk scores Risk prediction UK Biobank

Mesh : Adult Humans Electronic Health Records Retrospective Studies Pulmonary Disease, Chronic Obstructive / diagnosis Machine Learning Databases, Factual

来  源:   DOI:10.7717/peerj.16950   PDF(Pubmed)

Abstract:
Chronic obstructive pulmonary disease (COPD) is a major public health concern, affecting estimated 164 million people worldwide. Early detection and intervention strategies are essential to reduce the burden of COPD, but current screening approaches are limited in their ability to accurately predict risk. Machine learning (ML) models offer promise for improved accuracy of COPD risk prediction by combining genetic and electronic medical record data. In this study, we developed and evaluated eight ML models for primary screening of COPD utilizing routine screening data, polygenic risk scores (PRS), additional clinical data, or a combination of all three. To assess our models, we conducted a retrospective analysis of approximately 329,396 patients in the UK Biobank database. Incorporating personal information and blood biochemical test results significantly improved the model\'s accuracy for predicting COPD risk, achieving a best performance of 0.8505 AUC, a specificity of 0.8539 and a sensitivity of 0.7584. These results indicate that ML models can be effectively utilized for accurate prediction of COPD risk in individuals aged 20 to 50 years, providing a valuable tool for early detection and intervention.
摘要:
慢性阻塞性肺疾病(COPD)是一个主要的公共卫生问题,影响全球估计1.64亿人。早期发现和干预策略对于减轻COPD的负担至关重要。但目前的筛查方法在准确预测风险的能力上是有限的.机器学习(ML)模型通过结合遗传和电子病历数据,为提高COPD风险预测的准确性提供了希望。在这项研究中,我们利用常规筛查数据开发并评估了8个用于COPD初级筛查的ML模型,多基因风险评分(PRS),额外的临床数据,或者三者的组合。为了评估我们的模型,我们对UKBiobank数据库中的约329,396例患者进行了回顾性分析.结合个人信息和血液生化检测结果显著提高了模型预测COPD风险的准确性,实现0.8505AUC的最佳性能,特异性为0.8539,敏感性为0.7584。这些结果表明,ML模型可以有效地用于20至50岁个体的COPD风险的准确预测。为早期发现和干预提供了一个有价值的工具。
公众号