全球数据共享计划中的数据之旅：扩大多发性硬化症研究的联合 3 层数据分析管道。The Journey of Data Within a Global Data Sharing Initiative: A Federated 3-Layer Data Analysis Pipeline to Scale Up Multiple Sclerosis Research.-医云文献数字医云科研云海量医学决策数据服务

Abstract：

BACKGROUND: Investigating low-prevalence diseases such as multiple sclerosis is challenging because of the rather small number of individuals affected by this disease and the scattering of real-world data across numerous data sources. These obstacles impair data integration, standardization, and analysis, which negatively impact the generation of significant meaningful clinical evidence.
OBJECTIVE: This study aims to present a comprehensive, research question-agnostic, multistakeholder-driven end-to-end data analysis pipeline that accommodates 3 prevalent data-sharing streams: individual data sharing, core data set sharing, and federated model sharing.
METHODS: A demand-driven methodology is employed for standardization, followed by 3 streams of data acquisition, a data quality enhancement process, a data integration procedure, and a concluding analysis stage to fulfill real-world data-sharing requirements. This pipeline\'s effectiveness was demonstrated through its successful implementation in the COVID-19 and multiple sclerosis global data sharing initiative.
RESULTS: The global data sharing initiative yielded multiple scientific publications and provided extensive worldwide guidance for the community with multiple sclerosis. The pipeline facilitated gathering pertinent data from various sources, accommodating distinct sharing streams and assimilating them into a unified data set for subsequent statistical analysis or secure data examination. This pipeline contributed to the assembly of the largest data set of people with multiple sclerosis infected with COVID-19.
CONCLUSIONS: The proposed data analysis pipeline exemplifies the potential of global stakeholder collaboration and underlines the significance of evidence-based decision-making. It serves as a paradigm for how data sharing initiatives can propel advancements in health care, emphasizing its adaptability and capacity to address diverse research inquiries.

摘要：

背景：研究低患病率疾病，如多发性硬化症是具有挑战性的，因为受这种疾病影响的个体数量相当少，并且真实世界的数据分散在许多数据源中。这些障碍削弱了数据集成，标准化,和分析，这对重要有意义的临床证据的产生产生了负面影响。
目的：本研究旨在提出一个全面的，研究问题-不可知论者，多利益相关者驱动的端到端数据分析管道，可容纳3种流行的数据共享流：个人数据共享，核心数据集共享，和联邦模型共享。
方法：标准化采用需求驱动的方法，其次是3流的数据采集，数据质量增强过程，数据集成过程，和最后的分析阶段，以满足现实世界的数据共享要求。通过在COVID-19和多发性硬化症全球数据共享计划中的成功实施，证明了该管道的有效性。
结果：全球数据共享计划产生了多种科学出版物，并为多发性硬化症社区提供了广泛的全球指导。该管道有助于从各种来源收集相关数据，容纳不同的共享流，并将它们同化为统一的数据集，用于后续的统计分析或安全的数据检查。这条管道促成了感染COVID-19的多发性硬化症患者的最大数据集的组装。
结论：拟议的数据分析流程体现了全球利益相关者合作的潜力，并强调了基于证据的决策的重要性。它是数据共享计划如何推动医疗保健进步的范例，强调其适应性和应对各种研究查询的能力。