关键词: CDM Cohort studies Data harmonization ETL Federated learning OMOP

Mesh : Humans Dementia Netherlands Cohort Studies Algorithms Information Dissemination / methods Biomedical Research

来  源:   DOI:10.1016/j.jbi.2024.104661

Abstract:
BACKGROUND: Establishing collaborations between cohort studies has been fundamental for progress in health research. However, such collaborations are hampered by heterogeneous data representations across cohorts and legal constraints to data sharing. The first arises from a lack of consensus in standards of data collection and representation across cohort studies and is usually tackled by applying data harmonization processes. The second is increasingly important due to raised awareness for privacy protection and stricter regulations, such as the GDPR. Federated learning has emerged as a privacy-preserving alternative to transferring data between institutions through analyzing data in a decentralized manner.
METHODS: In this study, we set up a federated learning infrastructure for a consortium of nine Dutch cohorts with appropriate data available to the etiology of dementia, including an extract, transform, and load (ETL) pipeline for data harmonization. Additionally, we assessed the challenges of transforming and standardizing cohort data using the Observational Medical Outcomes Partnership (OMOP) common data model (CDM) and evaluated our tool in one of the cohorts employing federated algorithms.
RESULTS: We successfully applied our ETL tool and observed a complete coverage of the cohorts\' data by the OMOP CDM. The OMOP CDM facilitated the data representation and standardization, but we identified limitations for cohort-specific data fields and in the scope of the vocabularies available. Specific challenges arise in a multi-cohort federated collaboration due to technical constraints in local environments, data heterogeneity, and lack of direct access to the data.
CONCLUSIONS: In this article, we describe the solutions to these challenges and limitations encountered in our study. Our study shows the potential of federated learning as a privacy-preserving solution for multi-cohort studies that enhance reproducibility and reuse of both data and analyses.
摘要:
背景:建立队列研究之间的合作对于健康研究的进展至关重要。然而,这种协作受到跨队列的异构数据表示和数据共享的法律约束的阻碍。首先是由于在队列研究的数据收集和表示标准方面缺乏共识,通常通过应用数据协调过程来解决。由于提高了对隐私保护的意识和更严格的法规,第二个问题变得越来越重要。比如GDPR。联合学习已经成为一种保护隐私的替代方案,通过以分散的方式分析数据,在机构之间传输数据。
方法:在本研究中,我们为一个由9个荷兰同伙组成的联盟建立了一个联邦学习基础设施,并提供了有关痴呆症病因的适当数据,包括摘录,变换,和用于数据协调的加载(ETL)管道。此外,我们使用观察性医学结果伙伴关系(OMOP)通用数据模型(CDM)评估了转换和标准化队列数据的挑战,并在其中一个使用联合算法的队列中评估了我们的工具.
结果:我们成功地应用了我们的ETL工具,并观察到OMOPCDM对队列数据的完全覆盖。OMOP清洁发展机制促进了数据表示和标准化,但我们确定了队列特定数据字段和可用词汇表范围的局限性.由于本地环境中的技术限制,在多队列联合协作中出现了具体挑战,数据异质性,缺乏对数据的直接访问。
结论:在本文中,我们描述了我们研究中遇到的这些挑战和局限性的解决方案.我们的研究显示了联合学习作为多队列研究的隐私保护解决方案的潜力,可增强数据和分析的可重复性和重用性。
公众号