关键词: activities of daily living electronic health records free-text notes natural language processing unstructured data

来  源:   DOI:10.1093/jamiaopen/ooae044   PDF(Pubmed)

Abstract:
UNASSIGNED: Natural language processing (NLP) can enhance research on activities of daily living (ADL) by extracting structured information from unstructured electronic health records (EHRs) notes. This review aims to give insight into the state-of-the-art, usability, and performance of NLP systems to extract information on ADL from EHRs.
UNASSIGNED: A systematic review was conducted based on searches in Pubmed, Embase, Cinahl, Web of Science, and Scopus. Studies published between 2017 and 2022 were selected based on predefined eligibility criteria.
UNASSIGNED: The review identified 22 studies. Most studies (65%) used NLP for classifying unstructured EHR data on 1 or 2 ADL. Deep learning, combined with a ruled-based method or machine learning, was the approach most commonly used. NLP systems varied widely in terms of the pre-processing and algorithms. Common performance evaluation methods were cross-validation and train/test datasets, with F1, precision, and sensitivity as the most frequently reported evaluation metrics. Most studies reported relativity high overall scores on the evaluation metrics.
UNASSIGNED: NLP systems are valuable for the extraction of unstructured EHR data on ADL. However, comparing the performance of NLP systems is difficult due to the diversity of the studies and challenges related to the dataset, including restricted access to EHR data, inadequate documentation, lack of granularity, and small datasets.
UNASSIGNED: This systematic review indicates that NLP is promising for deriving information on ADL from unstructured EHR notes. However, what the best-performing NLP system is, depends on characteristics of the dataset, research question, and type of ADL.
摘要:
自然语言处理(NLP)可以通过从非结构化电子健康记录(EHR)笔记中提取结构化信息来增强对日常生活活动(ADL)的研究。这篇综述旨在深入了解最先进的技术,可用性,以及NLP系统从EHR中提取ADL信息的性能。
根据Pubmed,Embase,Cinahl,WebofScience,还有Scopus.2017年至2022年发表的研究是根据预定义的资格标准选择的。
该综述确定了22项研究。大多数研究(65%)使用NLP对1或2个ADL的非结构化EHR数据进行分类。深度学习,结合基于规则的方法或机器学习,是最常用的方法。NLP系统在预处理和算法方面变化很大。常见的性能评估方法是交叉验证和训练/测试数据集,与F1,精度,和敏感度作为最常报告的评估指标。大多数研究报告在评估指标上的相对性总分很高。
NLP系统对于在ADL上提取非结构化EHR数据很有价值。然而,由于研究的多样性和与数据集相关的挑战,很难比较NLP系统的性能,包括对EHR数据的限制访问,文件不足,缺乏粒度,和小数据集。
本系统综述表明,NLP有望从非结构化EHR笔记中获取有关ADL的信息。然而,表现最好的NLP系统是什么,取决于数据集的特征,研究问题,ADL的类型。
公众号