简介:尽管人们越来越努力标准化社会健康决定因素(SDOH)的编码,它们很少被记录在电子健康记录(EHR)中。大多数SDOH变量仍在非结构化字段中捕获(即,自由文本)的EHR。在这项研究中,我们试图评估一种实用的文本挖掘方法(即,先进的模式匹配技术)在识别涉及住房问题的短语中,影响基于价值的医疗保健提供者的重要SDOH领域,使用新英格兰地区一家大型多专业医疗集团的EHR,美国。为了介绍这种方法如何帮助卫生系统解决患者的SDOH挑战,我们评估了有和没有住房问题的患者的人口统计学和临床特征,并简要研究了研究人群以及有和没有住房挑战的人群的医疗保健利用模式。方法:我们确定了五类住房问题[即,无家可归电流(HC),无家可归史(HH),解决无家可归问题(HA),住房不稳定(HI),和建筑质量(BQ)],并通过与SDOH专家合作,开发了几个短语,查阅文献,并审查现有的编码标准。我们开发了模式匹配算法(即,高级正则表达式),然后在选定的EHR中应用它们。在将识别的短语与针对不同住房问题的手动注释的自由文本进行比较之后,我们评估了文本挖掘方法的召回(敏感性)和准确性(阳性预测值)。结果:研究数据集包括总共20,342名患者的EHR结构化数据和2,564,344个自由文本临床笔记。研究人群的平均年龄(SD)为75.96(7.51)。此外,58.78%的队列是女性。BQ和HI是EHR自由文本注释中记录的最常见的住房问题,而HH是最不常见的问题。正则表达式方法,与手动注释相比,在短语上有很高的精确度(阳性预测值),注意,和患者水平(96.36、95.00和94.44%,分别)跨越不同类别的住房问题,但召回(敏感)率相对较低(30.11%、32.20%和41.46%,分别)。结论:本研究结果可用于推进该领域的研究,评估EHR自由文本在识别住房问题高风险患者方面的潜在价值,为了改善病人的护理和结果,并最终减轻个人和社区之间的社会经济差异。
Introduction: Despite the growing efforts to standardize coding for social determinants of health (SDOH), they are infrequently captured in electronic health records (EHRs). Most SDOH variables are still captured in the unstructured fields (i.e., free-text) of EHRs. In this
study we attempt to evaluate a practical text mining approach (i.e., advanced pattern matching techniques) in identifying phrases referring to housing issues, an important SDOH domain affecting value-based healthcare providers, using EHR of a large multispecialty medical group in the New England region, United States. To present how this approach would help the health systems to address the SDOH challenges of their patients we assess the demographic and clinical characteristics of patients with and without housing issues and briefly look into the patterns of healthcare utilization among the
study population and for those with and without housing challenges. Methods: We identified five categories of housing issues [i.e., homelessness current (HC), homelessness history (HH), homelessness addressed (HA), housing instability (HI), and building quality (BQ)] and developed several phrases addressing each one through collaboration with SDOH experts, consulting the literature, and reviewing existing coding standards. We developed pattern-matching algorithms (i.e., advanced regular expressions), and then applied them in the selected EHR. We assessed the text mining approach for recall (sensitivity) and precision (positive predictive value) after comparing the identified phrases with manually annotated free-text for different housing issues. Results: The
study dataset included EHR structured data for a total of 20,342 patients and 2,564,344 free-text clinical notes. The mean (SD) age in the
study population was 75.96 (7.51). Additionally, 58.78% of the cohort were female. BQ and HI were the most frequent housing issues documented in EHR free-text notes and HH was the least frequent one. The regular expression methodology, when compared to manual annotation, had a high level of precision (positive predictive value) at phrase, note, and patient levels (96.36, 95.00, and 94.44%, respectively) across different categories of housing issues, but the recall (sensitivity) rate was relatively low (30.11, 32.20, and 41.46%, respectively). Conclusion: Results of this
study can be used to advance the research in this domain, to assess the potential value of EHR\'s free-text in identifying patients with a high risk of housing issues, to improve patient care and outcomes, and to eventually mitigate socioeconomic disparities across individuals and communities.