生命科学中的敏感数据共享：集中和联合方法概述。Sharing sensitive data in life sciences: an overview of centralized and federated approaches.-医云文献数字医云科研云海量医学决策数据服务

Abstract：

Biomedical data are generated and collected from various sources, including medical imaging, laboratory tests and genome sequencing. Sharing these data for research can help address unmet health needs, contribute to scientific breakthroughs, accelerate the development of more effective treatments and inform public health policy. Due to the potential sensitivity of such data, however, privacy concerns have led to policies that restrict data sharing. In addition, sharing sensitive data requires a secure and robust infrastructure with appropriate storage solutions. Here, we examine and compare the centralized and federated data sharing models through the prism of five large-scale and real-world use cases of strategic significance within the European data sharing landscape: the French Health Data Hub, the BBMRI-ERIC Colorectal Cancer Cohort, the federated European Genome-phenome Archive, the Observational Medical Outcomes Partnership/OHDSI network and the EBRAINS Medical Informatics Platform. Our analysis indicates that centralized models facilitate data linkage, harmonization and interoperability, while federated models facilitate scaling up and legal compliance, as the data typically reside on the data generator\'s premises, allowing for better control of how data are shared. This comparative study thus offers guidance on the selection of the most appropriate sharing strategy for sensitive datasets and provides key insights for informed decision-making in data sharing efforts.

摘要：

生物医学数据从各种来源产生和收集，包括医学成像,实验室测试和基因组测序。分享这些数据用于研究可以帮助解决未满足的健康需求，有助于科学突破，加快开发更有效的治疗方法，并为公共卫生政策提供信息。由于这些数据的潜在敏感性，然而,隐私问题导致了限制数据共享的政策。此外,共享敏感数据需要具有适当存储解决方案的安全、可靠的基础架构。这里,我们通过在欧洲数据共享环境中具有战略意义的五个大规模和现实世界用例的棱镜来研究和比较集中式和联邦数据共享模型：法国健康数据中心，BBMRI-ERIC结直肠癌队列，联邦的欧洲基因组-表型档案,观察性医学成果合作伙伴关系/OHDSI网络和EBRAINS医学信息学平台。我们的分析表明，集中式模型促进了数据链接，协调和互操作性，虽然联邦模型有助于扩大规模和遵守法律，因为数据通常驻留在数据生成器的驻地，允许更好地控制数据的共享方式。因此，这项比较研究为敏感数据集选择最合适的共享策略提供了指导，并为数据共享工作中的知情决策提供了关键见解。