关键词: Big Data Data Linkage HIV Patient Confidentiality Validation

来  源:   DOI:10.1101/2024.06.19.24309149   PDF(Pubmed)

Abstract:
UNASSIGNED: The use of big data and large language models in healthcare can play a key role in improving patient treatment and healthcare management, especially when applied to large-scale administrative data. A major challenge to achieving this is ensuring that patient confidentiality and personal information is protected. One way to overcome this is by augmenting clinical data with administrative laboratory dataset linkages in order to avoid the use of demographic information.
UNASSIGNED: We explored an alternative method to examine patient files from a large administrative dataset in South Africa (the National Health Laboratory Services, or NHLS), by linking external data to the NHLS database using specimen barcodes associated with laboratory tests. This offers us with a deterministic way of performing data linkages without accessing demographic information. In this paper, we quantify the performance metrics of this approach.
UNASSIGNED: The linkage of the large NHLS data to external hospital data using specimen barcodes achieved a 95% success. Out of the 1200 records in the validation sample, 87% were exact matches and 9% were matches with typographic correction. The remaining 5% were either complete mismatches or were due to duplicates in the administrative data.
UNASSIGNED: The high success rate indicates the reliability of using barcodes for linking data without demographic identifiers. Specimen barcodes are an effective tool for deterministic linking in health data, and may provide a method of creating large, linked data sets without compromising patient confidentiality.
摘要:
在医疗保健中使用大数据和大型语言模型可以在改善患者治疗和医疗保健管理方面发挥关键作用,特别是应用于大规模行政数据时。实现这一目标的主要挑战是确保患者的机密性和个人信息得到保护。克服这一点的一种方法是通过使用管理实验室数据集链接来增强临床数据,以避免使用人口统计信息。
我们探索了一种替代方法,可以从南非的大型管理数据集中检查患者文件(国家卫生实验室服务,或NHLS),通过使用与实验室测试相关的样本条形码将外部数据链接到NHLS数据库。这为我们提供了一种确定性的方式来执行数据链接,而无需访问人口统计信息。在本文中,我们量化了这种方法的性能指标。
使用样本条形码将大型NHLS数据与外部医院数据进行链接获得了95%的成功。在验证样本的1200条记录中,87%是完全匹配,9%是与印刷校正匹配。剩余的5%是完全不匹配的,或者是由于管理数据中的重复。
高成功率表明使用条形码在没有人口统计标识符的情况下链接数据的可靠性。样本条形码是健康数据中确定性链接的有效工具,并且可以提供一种创建大型的方法,在不损害患者机密性的情况下链接数据集。
公众号