关键词: ACL Clinical notes Electronic health records Natural language processing Registry building

Mesh : Natural Language Processing Registries Humans Electronic Health Records Data Mining / methods

来  源:   DOI:10.1016/j.artmed.2024.102847

Abstract:
Building clinical registries is an important step in clinical research and improvement of patient care quality. Natural Language Processing (NLP) methods have shown promising results in extracting valuable information from unstructured clinical notes. However, the structure and nature of clinical notes are very different from regular text that state-of-the-art NLP models are trained and tested on, and they have their own set of challenges. In this study, we propose Sentence Extractor with Keywords (SE-K), an efficient and interpretable classification approach for extracting information from clinical notes and show that it outperforms more computationally expensive methods in text classification. Following the Institutional Review Board (IRB) approval, we used SE-K and two embedding based NLP approaches (Sentence Extractor with Embeddings (SE-E) and Bidirectional Encoder Representations from Transformers (BERT)) to develop comprehensive registry of anterior cruciate ligament surgeries from 20 years of unstructured clinical data at a multi-site tertiary-care regional children\'s hospital. The low-resource approach (SE-K) had better performance (average AUROC of 0.94 ± 0.04) than the embedding-based approaches (SE-E: 0.93 ± 0.04 and BERT: 0.87 ± 0.09) for out of sample validation, in addition to minimum performance drop between test and out-of-sample validation. Moreover, the SE-K approach was at least six times faster (on CPU) than SE-E (on CPU) and BERT (on GPU) and provides interpretability. Our proposed approach, SE-K, can be effectively used to extract relevant variables from clinic notes to build large-scale registries, with consistently better performance compared to the more resource-intensive approaches (e.g., BERT). Such approaches can facilitate information extraction from unstructured notes for registry building, quality improvement and adverse event monitoring.
摘要:
建立临床注册是临床研究和改善患者护理质量的重要步骤。自然语言处理(NLP)方法在从非结构化临床笔记中提取有价值的信息方面已显示出有希望的结果。然而,临床笔记的结构和性质与训练和测试最先进的NLP模型的常规文本非常不同,他们有自己的挑战。在这项研究中,我们提出了带关键字的句子提取器(SE-K),一种有效和可解释的分类方法,用于从临床笔记中提取信息,并表明它在文本分类中优于计算成本更高的方法。在机构审查委员会(IRB)批准后,我们使用SE-K和两种基于嵌入的NLP方法(带嵌入的句子提取器(SE-E)和来自变形金刚的双向编码器表示(BERT)),从多站点三级保健地区儿童医院的20年非结构化临床数据中建立了前交叉韧带手术的全面注册表。对于样本外验证,低资源方法(SE-K)比基于嵌入的方法(SE-E:0.93±0.04和BERT:0.87±0.09)具有更好的性能(平均AUROC为0.94±0.04)。除了测试和样品外验证之间的最小性能下降。此外,SE-K方法(在CPU上)比SE-E(在CPU上)和BERT(在GPU上)快至少六倍,并提供可解释性。我们提出的方法,SE-K,可以有效地用于从临床笔记中提取相关变量,以建立大规模的登记册,与资源密集型方法相比,性能始终更好(例如,BERT).这样的方法可以促进从非结构化票据中提取信息,用于注册表建设,质量改进和不良事件监测。
公众号