关键词: BERT bidirectional encoder representations from transformers fine-tuning BERT food information extraction information extraction machine learning named-entity recognition natural language processing semantic annotation

Mesh : Algorithms Humans Information Storage and Retrieval Machine Learning Natural Language Processing Semantics

来  源:   DOI:10.2196/28229   PDF(Pubmed)

Abstract:
Recently, food science has been garnering a lot of attention. There are many open research questions on food interactions, as one of the main environmental factors, with other health-related entities such as diseases, treatments, and drugs. In the last 2 decades, a large amount of work has been done in natural language processing and machine learning to enable biomedical information extraction. However, machine learning in food science domains remains inadequately resourced, which brings to attention the problem of developing methods for food information extraction. There are only few food semantic resources and few rule-based methods for food information extraction, which often depend on some external resources. However, an annotated corpus with food entities along with their normalization was published in 2019 by using several food semantic resources.
In this study, we investigated how the recently published bidirectional encoder representations from transformers (BERT) model, which provides state-of-the-art results in information extraction, can be fine-tuned for food information extraction.
We introduce FoodNER, which is a collection of corpus-based food named-entity recognition methods. It consists of 15 different models obtained by fine-tuning 3 pretrained BERT models on 5 groups of semantic resources: food versus nonfood entity, 2 subsets of Hansard food semantic tags, FoodOn semantic tags, and Systematized Nomenclature of Medicine Clinical Terms food semantic tags.
All BERT models provided very promising results with 93.30% to 94.31% macro F1 scores in the task of distinguishing food versus nonfood entity, which represents the new state-of-the-art technology in food information extraction. Considering the tasks where semantic tags are predicted, all BERT models obtained very promising results once again, with their macro F1 scores ranging from 73.39% to 78.96%.
FoodNER can be used to extract and annotate food entities in 5 different tasks: food versus nonfood entities and distinguishing food entities on the level of food groups by using the closest Hansard semantic tags, the parent Hansard semantic tags, the FoodOn semantic tags, or the Systematized Nomenclature of Medicine Clinical Terms semantic tags.
摘要:
最近,食品科学引起了很多关注。关于食物相互作用有许多开放的研究问题,作为主要的环境因素之一,与其他健康相关的实体,如疾病,治疗,和毒品。在过去的20年里,在自然语言处理和机器学习方面已经做了大量的工作,以实现生物医学信息的提取。然而,食品科学领域的机器学习资源仍然不足,食品信息提取方法的开发问题引起了人们的关注。食物语义资源很少,基于规则的食物信息提取方法也很少,这通常取决于一些外部资源。然而,2019年,通过使用几种食物语义资源,发布了带有食物实体及其规范化的注释语料库。
在这项研究中,我们研究了最近发布的来自变压器(BERT)模型的双向编码器表示,它提供了最先进的信息提取结果,可以对食物信息提取进行微调。
我们介绍FoodNER,这是一个基于语料库的食物命名实体识别方法的集合。它由15个不同的模型组成,通过对5组语义资源进行3个预训练的BERT模型进行微调而获得:食物与非食物实体,Hansard食品语义标签的2个子集,FoodOn语义标签,和系统化的医学命名法临床术语食品语义标签。
所有BERT模型都提供了非常有希望的结果,在区分食物与非食物实体的任务中,宏F1得分为93.30%至94.31%,这代表了食品信息提取的新技术。考虑到语义标签被预测的任务,所有BERT模型都再次获得了非常有希望的结果,他们的宏F1得分从73.39%到78.96%不等。
FoodNER可用于在5种不同的任务中提取和注释食物实体:食物与非食物实体,并通过使用最接近的Hansard语义标签在食物组级别上区分食物实体,父Hansard语义标签,FoodOn语义标签,或医学临床术语语义标签的系统化命名法。
公众号