Mesh : Humans Natural Language Processing Electronic Health Records Neoplasms / therapy Response Evaluation Criteria in Solid Tumors Machine Learning Data Mining / methods Algorithms Deep Learning

来  源:   DOI:10.1200/CCI.23.00166

Abstract:
OBJECTIVE: The RECIST guidelines provide a standardized approach for evaluating the response of cancer to treatment, allowing for consistent comparison of treatment efficacy across different therapies and patients. However, collecting such information from electronic health records manually can be extremely labor-intensive and time-consuming because of the complexity and volume of clinical notes. The aim of this study is to apply natural language processing (NLP) techniques to automate this process, minimizing manual data collection efforts, and improving the consistency and reliability of the results.
METHODS: We proposed a complex, hybrid NLP system that automates the process of extracting, linking, and summarizing anticancer therapy and associated RECIST-like responses from narrative clinical text. The system consists of multiple machine learning-/deep learning-based and rule-based modules for diverse NLP tasks such as named entity recognition, assertion classification, relation extraction, and text normalization, to address different challenges associated with anticancer therapy and response information extraction. We then evaluated the system performances on two independent test sets from different institutions to demonstrate its effectiveness and generalizability.
RESULTS: The system used domain-specific language models, BioBERT and BioClinicalBERT, for high-performance therapy mentions identification and RECIST responses extraction and categorization. The best-performing model achieved a 0.66 score in linking therapy and RECIST response mentions, with end-to-end performance peaking at 0.74 after relation normalization, indicating substantial efficacy with room for improvement.
CONCLUSIONS: We developed, implemented, and tested an information extraction system from clinical notes for cancer treatment and efficacy assessment information. We expect this system will support future cancer research, particularly oncologic studies that focus on efficiently assessing the effectiveness and reliability of cancer therapeutics.
摘要:
目的:RECIST指南为评估癌症对治疗的反应提供了一种标准化方法,允许对不同疗法和患者的治疗效果进行一致的比较。然而,由于临床记录的复杂性和数量,从电子健康记录中手动收集此类信息可能是极其费力和耗时的。这项研究的目的是应用自然语言处理(NLP)技术来自动化这一过程,尽量减少手动数据收集工作,提高结果的一致性和可靠性。
方法:我们提出了一种复杂的,混合NLP系统,自动提取过程,链接,并从叙述性临床文本中总结抗癌治疗和相关的RECIST样反应。该系统由多个基于机器学习/深度学习和基于规则的模块组成,用于各种NLP任务,例如命名实体识别,断言分类,关系提取,和文本规范化,以解决与抗癌治疗和反应信息提取相关的不同挑战。然后,我们在来自不同机构的两个独立测试集上评估了系统性能,以证明其有效性和可推广性。
结果:系统使用特定领域的语言模型,Biobert和BioClinicalBERT,对于高性能治疗提到识别和RECIST反应提取和分类。表现最好的模型在联系治疗和RECIST反应提及方面获得了0.66分,在关系归一化后,端到端性能达到0.74的峰值,表明有改善空间的实质性功效。
结论:我们开发了,已实施,并测试了从临床笔记中提取癌症治疗和疗效评估信息的信息提取系统。我们预计这个系统将支持未来的癌症研究,特别是专注于有效评估癌症治疗的有效性和可靠性的肿瘤学研究。
公众号