Mesh : Humans Natural Language Processing Machine Learning Language Information Storage and Retrieval PubMed

来  源:   DOI:10.1016/j.ijmedinf.2023.105122

Abstract:
Natural Language Processing (NLP) applications have developed over the past years in various fields including its application to clinical free text for named entity recognition and relation extraction. However, there has been rapid developments the last few years that there\'s currently no overview of it. Moreover, it is unclear how these models and tools have been translated into clinical practice. We aim to synthesize and review these developments.
We reviewed literature from 2010 to date, searching PubMed, Scopus, the Association of Computational Linguistics (ACL), and Association of Computer Machinery (ACM) libraries for studies of NLP systems performing general-purpose (i.e., not disease- or treatment-specific) information extraction and relation extraction tasks in unstructured clinical text (e.g., discharge summaries).
We included in the review 94 studies with 30 studies published in the last three years. Machine learning methods were used in 68 studies, rule-based in 5 studies, and both in 22 studies. 63 studies focused on Named Entity Recognition, 13 on Relation Extraction and 18 performed both. The most frequently extracted entities were \"problem\", \"test\" and \"treatment\". 72 studies used public datasets and 22 studies used proprietary datasets alone. Only 14 studies defined clearly a clinical or information task to be addressed by the system and just three studies reported its use outside the experimental setting. Only 7 studies shared a pre-trained model and only 8 an available software tool.
Machine learning-based methods have dominated the NLP field on information extraction tasks. More recently, Transformer-based language models are taking the lead and showing the strongest performance. However, these developments are mostly based on a few datasets and generic annotations, with very few real-world use cases. This may raise questions about the generalizability of findings, translation into practice and highlights the need for robust clinical evaluation.
摘要:
背景技术在过去的几年中,自然语言处理(NLP)应用已经在各个领域中发展,包括其在临床自由文本中用于命名实体识别和关系提取的应用。然而,在过去的几年里有了快速的发展,目前还没有关于它的概述。此外,目前尚不清楚这些模型和工具是如何转化为临床实践的.我们的目标是综合和回顾这些发展。
方法:我们回顾了2010年至今的文献,搜索PubMed,Scopus,计算语言学协会(ACL),和计算机机械协会(ACM)库,用于研究执行通用的NLP系统(即,非疾病或治疗特定的)非结构化临床文本中的信息提取和关系提取任务(例如,出院摘要)。
结果:我们纳入了最近三年发表的94项研究和30项研究。68项研究中使用了机器学习方法,在5项研究中基于规则,在22项研究中。63项研究集中在命名实体识别上,在关系提取上进行了13和18。最常提取的实体是\"problem\",“测试”和“治疗”。72项研究使用公共数据集,22项研究仅使用专有数据集。只有14项研究明确定义了系统要解决的临床或信息任务,只有3项研究报告了其在实验环境之外的使用。只有7项研究共享了预训练模型,只有8项研究共享了可用的软件工具。
结论:基于机器学习的方法已经主导了NLP领域的信息提取任务。最近,基于Transformer的语言模型处于领先地位,并显示出最强的性能。然而,这些开发主要基于一些数据集和通用注释,很少有真实世界的用例。这可能会引起人们对调查结果的普遍性的质疑,转化为实践,并强调需要进行强有力的临床评估。
公众号