关键词: National Center for Health Statistics National Death Index National Hospital Care Survey

来  源:   DOI:10.3233/sji-210891   PDF(Pubmed)

Abstract:
BACKGROUND: The National Center for Health Statistics (NCHS) links data from surveys to administrative data sources, but privacy concerns make accessing new data sources difficult. Privacy-preserving record linkage (PPRL) is an alternative to traditional linkage approaches that may overcome this barrier. However, prior to implementing PPRL techniques it is important to understand their effect on data quality.
METHODS: Results from PPRL were compared to results from an established linkage method, which uses unencrypted (plain text) identifiers and both deterministic and probabilistic techniques. The established method was used as the gold standard. Links performed with PPRL were evaluated for precision and recall. An initial assessment and a refined approach were implemented. The impact of PPRL on secondary data analysis, including match and mortality rates, was assessed.
RESULTS: The match rates for all approaches were similar, 5.1% for the gold standard, 5.4% for the initial PPRL and 5.0% for the refined PPRL approach. Precision ranged from 93.8% to 98.9% and recall ranged from 98.7% to 97.8%, depending on the selection of tokens from PPRL. The impact of PPRL on secondary data analysis was minimal.
CONCLUSIONS: The findings suggest PPRL works well to link patient records to the National Death Index (NDI) since both sources have a high level of non-missing personally identifiable information, especially among adults 65 and older who may also have a higher likelihood of linking to the NDI.
CONCLUSIONS: The results from this study are encouraging for first steps for a statistical agency in the implementation of PPRL approaches, however, future research is still needed.
摘要:
背景:国家卫生统计中心(NCHS)将调查数据与行政数据源联系起来,但是隐私问题使访问新数据源变得困难。隐私保护记录链接(PPRL)是可以克服此障碍的传统链接方法的替代方法。然而,在实施PPRL技术之前,重要的是要了解它们对数据质量的影响。
方法:将PPRL的结果与已建立的链接方法的结果进行比较,它使用未加密(纯文本)标识符以及确定性和概率性技术。所建立的方法被用作黄金标准。对使用PPRL进行的链接进行了准确率和召回率评估。实施了初步评估和完善的方法。PPRL对二次数据分析的影响,包括匹配率和死亡率,被评估。
结果:所有方法的匹配率相似,黄金标准为5.1%,初始PPRL为5.4%,精制PPRL为5.0%。准确率从93.8%到98.9%,召回率从98.7%到97.8%,取决于从PPRL中选择的令牌。PPRL对二次数据分析的影响很小。
结论:研究结果表明,PPRL可以很好地将患者记录与国家死亡指数(NDI)联系起来,因为这两个来源都具有高水平的非缺失个人身份信息。尤其是在65岁及以上的成年人中,他们也可能与NDI有更高的可能性。
结论:这项研究的结果令人鼓舞,这是统计机构实施PPRL方法的第一步。然而,未来的研究仍然需要。
公众号