关键词: France big data care trajectories data linkage deterministic approach health data health database health products integration linkage probabilistic approach public health activity usability

Mesh : Humans Artificial Intelligence Information Storage and Retrieval Registries Hospitals Big Data

来  源:   DOI:10.2196/41048   PDF(Pubmed)

Abstract:
European national disparities in the integration of data linkage (ie, being able to match patient data between databases) into routine public health activities were recently highlighted. In France, the claims database covers almost the whole population from birth to death, offering a great research potential for data linkage. As the use of a common unique identifier to directly link personal data is often limited, linkage with a set of indirect key identifiers has been developed, which is associated with the linkage quality challenge to minimize errors in linked data.
The aim of this systematic review is to analyze the type and quality of research publications on indirect data linkage on health product use and care trajectories in France.
A comprehensive search for all papers published in PubMed/Medline and Embase databases up to December 31, 2022, involving linked French database focusing on health products use or care trajectories was realized. Only studies based on the use of indirect identifiers were included (ie, without a unique personal identifier available to easily link the databases). A descriptive analysis of data linkage with quality indicators and adherence to the Bohensky framework for evaluating data linkage studies was also realized.
In total, 16 papers were selected. Data linkage was performed at the national level in 7 (43.8%) cases or at the local level in 9 (56.2%) studies. The number of patients included in the different databases and resulting from data linkage varied greatly, respectively, from 713 to 75,000 patients and from 210 to 31,000 linked patients. The diseases studied were mainly chronic diseases and infections. The objectives of the data linkage were multiple: to estimate the risk of adverse drug reactions (ADRs; n=6, 37.5%), to reconstruct the patient\'s care trajectory (n=5, 31.3%), to describe therapeutic uses (n=2, 12.5%), to evaluate the benefits of treatments (n=2, 12.5%), and to evaluate treatment adherence (n=1, 6.3%). Registries are the most frequently linked databases with French claims data. No studies have looked at linking with a hospital data warehouse, a clinical trial database, or patient self-reported databases. The linkage approach was deterministic in 7 (43.8%) studies, probabilistic in 4 (25.0%) studies, and not specified in 5 (31.3%) studies. The linkage rate was mainly from 80% to 90% (reported in 11/15, 73.3%, studies). Adherence to the Bohensky framework for evaluating data linkage studies showed that the description of the source databases for the linkage was always performed but that the completion rate and accuracy of the variables to be linked were not systematically described.
This review highlights the growing interest in health data linkage in France. Nevertheless, regulatory, technical, and human constraints remain major obstacles to their deployment. The volume, variety, and validity of the data represent a real challenge, and advanced expertise and skills in statistical analysis and artificial intelligence are required to treat these big data.
摘要:
背景:欧洲国家在数据链接整合方面的差异(即,最近强调能够将数据库之间的患者数据匹配)与常规公共卫生活动相匹配。在法国,索赔数据库几乎涵盖了从出生到死亡的整个人口,为数据链接提供了巨大的研究潜力。由于使用通用唯一标识符直接链接个人数据通常受到限制,已经开发了与一组间接密钥标识符的链接,这与链接质量挑战相关,以最大限度地减少链接数据中的错误。
目的:本系统综述的目的是分析法国有关健康产品使用和护理轨迹的间接数据链接的研究出版物的类型和质量。
方法:实现了对截至2022年12月31日在PubMed/Medline和Embase数据库中发表的所有论文的全面搜索,涉及链接的法国数据库,重点关注健康产品的使用或护理轨迹。仅包括基于使用间接标识符的研究(即,没有可用于轻松链接数据库的唯一个人标识符)。还实现了对数据链接与质量指标的描述性分析以及对Bohensky框架的评估数据链接研究的坚持。
结果:总计,选择了16篇论文。在7例(43.8%)病例中在国家一级进行了数据链接,在9例(56.2%)研究中在地方一级进行了数据链接。不同数据库中包含的患者数量以及由数据链接产生的患者数量差异很大,分别,713至75,000名患者和210至31,000名相关患者。研究的疾病主要是慢性疾病和感染。数据链接的目标是多个:估计药物不良反应的风险(ADR;n=6,37.5%),重建患者的护理轨迹(n=5,31.3%),描述治疗用途(n=2,12.5%),为了评估治疗的益处(n=2,12.5%),并评估治疗依从性(n=1,6.3%)。登记处是与法国索赔数据最频繁链接的数据库。没有研究研究过与医院数据仓库的联系,临床试验数据库,或患者自我报告的数据库。在7项(43.8%)研究中,连锁方法是确定性的,4项(25.0%)研究中的概率,5项(31.3%)研究中未指明。联动率主要从80%到90%(11/15报告,73.3%,研究)。坚持Bohensky框架来评估数据链接研究表明,总是对链接的源数据库进行描述,但是没有系统地描述要链接的变量的完成率和准确性。
结论:这篇综述强调了法国对健康数据关联的兴趣与日俱增。然而,监管,技术,人为限制仍然是其部署的主要障碍。音量,品种,数据的有效性代表了一个真正的挑战,处理这些大数据需要统计分析和人工智能方面的先进专业知识和技能。
公众号