背景:对数字健康记录的二级调查,包括来自德国医疗数据集成中心(DIC)的电子患者数据,为加强未来的病人护理铺平道路。然而,仅捕获有关完整性的有限信息,可追溯性,和(敏感)数据元素的质量。这种缺乏细节降低了对收集数据有效性的信任。从技术角度来看,坚持广泛接受的公平(Findability,可访问性,互操作性,和可重用性)的数据管理原则需要使用与来源相关的元数据来丰富数据。Provenance提供了对数据元素重用的准备情况的见解,并作为数据治理的提供者。
目的:本研究的主要目的是增加医学DIC中临床常规数据的可重用性,以便在临床研究中进行二次利用。我们的目标是建立来源跟踪,以支持数据完整性的状态,可靠性,因此,信任电子健康记录,从而加强医疗DIC的问责制。作为第一步,我们介绍了整合国际标准的概念验证来源库的实施。
方法:我们坚持为出处框架定制的路线图,并检查了整个ETL的数据集成步骤(提取,变换,和负载)阶段。在成熟度模型之后,我们得出了来源库的要求。使用这种研究方法,我们制定了一个具有相关元数据的来源模型,并实现了一个概念证明来源类。此外,我们无缝地整合了国际公认的万维网联盟(W3C)出处标准,将生成的来源记录与可互操作的医疗保健标准FastHealthcareInteroperabilityResources保持一致,并以各种表示格式呈现。最终,我们对来源追踪测量进行了全面评估。
结果:这项研究标志着在德国医学DIC中在数据元素级别首次实现了集成来源追踪。我们设计并执行了一种实用的方法,该方法可以协同实现质量和健康标准指导(元)数据管理实践的稳健性。我们的测量表明管道执行时间值得称赞,在处理临床常规数据时达到显著的准确性和可靠性水平,从而确保医疗DIC的问责制。这些发现应激发开发其他工具,旨在提供基于证据的可靠电子健康记录服务供二次使用。
结论:为概念来源证明类概述的研究方法已精心设计,以促进有效和可靠的核心数据管理实践。它旨在通过赋予有意义的来源来增强生物医学数据,从而增强研究和社会的利益。此外,它促进了生物医学数据的简化重用。因此,该系统降低了风险,因为不了解所有数据元素的起源和质量的数据分析是徒劳的。虽然该方法最初是为医疗DIC用例开发的,这些原则可以普遍适用于整个科学领域。
BACKGROUND: Secondary investigations into digital health records, including electronic patient data from German medical data integration centers (DICs), pave the way for enhanced future patient care. However, only limited information is captured regarding the integrity,
traceability, and quality of the (sensitive) data elements. This lack of detail diminishes trust in the validity of the collected data. From a technical standpoint, adhering to the widely accepted FAIR (Findability, Accessibility, Interoperability, and Reusability) principles for data stewardship necessitates enriching data with provenance-related metadata. Provenance offers insights into the readiness for the reuse of a data element and serves as a supplier of data governance.
OBJECTIVE: The primary goal of this
study is to augment the reusability of clinical routine data within a medical DIC for secondary utilization in clinical research. Our aim is to establish provenance traces that underpin the status of data integrity, reliability, and consequently, trust in electronic health records, thereby enhancing the accountability of the medical DIC. We present the implementation of a proof-of-concept provenance library integrating international standards as an initial step.
METHODS: We adhered to a customized road map for a provenance framework, and examined the data integration steps across the ETL (extract, transform, and load) phases. Following a maturity model, we derived requirements for a provenance library. Using this research approach, we formulated a provenance model with associated metadata and implemented a proof-of-concept provenance class. Furthermore, we seamlessly incorporated the internationally recognized Word Wide Web Consortium (W3C) provenance standard, aligned the resultant provenance records with the interoperable health care standard Fast Healthcare Interoperability Resources, and presented them in various representation formats. Ultimately, we conducted a thorough assessment of provenance trace measurements.
RESULTS: This
study marks the inaugural implementation of integrated provenance traces at the data element level within a German medical DIC. We devised and executed a practical method that synergizes the robustness of quality- and health standard-guided (meta)data management practices. Our measurements indicate commendable pipeline execution times, attaining notable levels of accuracy and reliability in processing clinical routine data, thereby ensuring accountability in the medical DIC. These findings should inspire the development of additional tools aimed at providing evidence-based and reliable electronic health record services for secondary use.
CONCLUSIONS: The research method outlined for the proof-of-concept provenance class has been crafted to promote effective and reliable core data management practices. It aims to enhance biomedical data by imbuing it with meaningful provenance, thereby bolstering the benefits for both research and society. Additionally, it facilitates the streamlined reuse of biomedical data. As a result, the system mitigates risks, as data analysis without knowledge of the origin and quality of all data elements is rendered futile. While the approach was initially developed for the medical DIC use case, these principles can be universally applied throughout the scientific domain.