关键词: Data management Disease risk prediction Electronic medical record Machine learning Quality control

Mesh : Electronic Health Records / standards Humans Machine Learning Data Accuracy Risk Assessment / standards Delphi Technique

来  源:   DOI:10.1186/s12911-024-02533-z   PDF(Pubmed)

Abstract:
OBJECTIVE: This study aimed to develop and validate a quantitative index system for evaluating the data quality of Electronic Medical Records (EMR) in disease risk prediction using Machine Learning (ML).
METHODS: The index system was developed in four steps: (1) a preliminary index system was outlined based on literature review; (2) we utilized the Delphi method to structure the indicators at all levels; (3) the weights of these indicators were determined using the Analytic Hierarchy Process (AHP) method; and (4) the developed index system was empirically validated using real-world EMR data in a ML-based disease risk prediction task.
RESULTS: The synthesis of review findings and the expert consultations led to the formulation of a three-level index system with four first-level, 11 second-level, and 33 third-level indicators. The weights of these indicators were obtained through the AHP method. Results from the empirical analysis illustrated a positive relationship between the scores assigned by the proposed index system and the predictive performances of the datasets.
CONCLUSIONS: The proposed index system for evaluating EMR data quality is grounded in extensive literature analysis and expert consultation. Moreover, the system\'s high reliability and suitability has been affirmed through empirical validation.
CONCLUSIONS: The novel index system offers a robust framework for assessing the quality and suitability of EMR data in ML-based disease risk predictions. It can serve as a guide in building EMR databases, improving EMR data quality control, and generating reliable real-world evidence.
摘要:
目的:本研究旨在开发和验证一种定量指标体系,用于使用机器学习(ML)评估疾病风险预测中电子病历(EMR)的数据质量。
方法:该指标体系分为四个步骤:(1)根据文献综述概述了初步的指标体系;(2)我们利用德尔菲法对各级指标进行了结构;(3)使用层次分析法(AHP)方法确定了这些指标的权重;(4)在基于任务的ML风险预测中,使用现实世界的EMR数据对开发的指标体系进行了实证验证。
结果:综合审查结果和专家咨询,形成了一个三级指标体系,其中四个一级,11二级,和33个三级指标。通过层次分析法得到这些指标的权重。实证分析的结果表明,所提出的指标体系分配的分数与数据集的预测性能之间存在正相关关系。
结论:提出的评估EMR数据质量的指标体系基于广泛的文献分析和专家咨询。此外,通过经验验证,肯定了该系统的高可靠性和适用性。
结论:新的指标体系为评估基于ML的疾病风险预测中EMR数据的质量和适用性提供了一个强大的框架。它可以作为建立EMR数据库的指南,改善EMR数据质量控制,并产生可靠的真实世界证据。
公众号