关键词: Artificial intelligence COVID-19 Data cohort Named entity Natural language processing Relation extraction Transfer learning

Mesh : Humans Natural Language Processing COVID-19 Publications

来  源:   DOI:10.1186/s12911-023-02117-3

Abstract:
Extracting relevant information about infectious diseases is an essential task. However, a significant obstacle in supporting public health research is the lack of methods for effectively mining large amounts of health data.
This study aims to use natural language processing (NLP) to extract the key information (clinical factors, social determinants of health) from published cases in the literature.
The proposed framework integrates a data layer for preparing a data cohort from clinical case reports; an NLP layer to find the clinical and demographic-named entities and relations in the texts; and an evaluation layer for benchmarking performance and analysis. The focus of this study is to extract valuable information from COVID-19 case reports.
The named entity recognition implementation in the NLP layer achieves a performance gain of about 1-3% compared to benchmark methods. Furthermore, even without extensive data labeling, the relation extraction method outperforms benchmark methods in terms of accuracy (by 1-8% better). A thorough examination reveals the disease\'s presence and symptoms prevalence in patients.
A similar approach can be generalized to other infectious diseases. It is worthwhile to use prior knowledge acquired through transfer learning when researching other infectious diseases.
摘要:
背景:提取有关传染病的相关信息是一项必不可少的任务。然而,支持公共卫生研究的一个重要障碍是缺乏有效挖掘大量健康数据的方法。
目的:本研究旨在利用自然语言处理(NLP)提取关键信息(临床因素,健康的社会决定因素)来自文献中公布的案例。
方法:提出的框架集成了一个数据层,用于从临床病例报告中准备数据队列;一个NLP层,用于在文本中找到临床和人口统计学命名的实体和关系;以及一个用于基准性能和分析的评估层。本研究的重点是从COVID-19病例报告中提取有价值的信息。
结果:与基准方法相比,NLP层中的命名实体识别实现实现了约1-3%的性能增益。此外,即使没有大量的数据标签,关系提取方法在准确性方面优于基准方法(提高1-8%)。彻底的检查显示疾病的存在和患者的症状患病率。
结论:类似的方法可以推广到其他传染病。在研究其他传染病时,使用通过迁移学习获得的先验知识是值得的。
公众号