Mesh : Natural Language Processing Data Mining / methods Humans

来  源:   DOI:10.1093/database/baae057   PDF(Pubmed)

Abstract:
Biomedical relation extraction is an ongoing challenge within the natural language processing community. Its application is important for understanding scientific biomedical literature, with many use cases, such as drug discovery, precision medicine, disease diagnosis, treatment optimization and biomedical knowledge graph construction. Therefore, the development of a tool capable of effectively addressing this task holds the potential to improve knowledge discovery by automating the extraction of relations from research manuscripts. The first track in the BioCreative VIII competition extended the scope of this challenge by introducing the detection of novel relations within the literature. This paper describes that our participation system initially focused on jointly extracting and classifying novel relations between biomedical entities. We then describe our subsequent advancement to an end-to-end model. Specifically, we enhanced our initial system by incorporating it into a cascading pipeline that includes a tagger and linker module. This integration enables the comprehensive extraction of relations and classification of their novelty directly from raw text. Our experiments yielded promising results, and our tagger module managed to attain state-of-the-art named entity recognition performance, with a micro F1-score of 90.24, while our end-to-end system achieved a competitive novelty F1-score of 24.59. The code to run our system is publicly available at https://github.com/ieeta-pt/BioNExt. Database URL: https://github.com/ieeta-pt/BioNExt.
摘要:
生物医学关系提取是自然语言处理社区中持续存在的挑战。它的应用对于理解科学生物医学文献很重要,有很多用例,比如药物发现,精准医学,疾病诊断,治疗优化和生物医学知识图谱构建。因此,能够有效解决这一任务的工具的开发具有通过自动化从研究手稿中提取关系来提高知识发现的潜力。BioCreativeVIII竞赛的第一首曲目通过在文献中引入对新颖关系的检测,扩大了这一挑战的范围。本文描述了我们的参与系统最初专注于联合提取和分类生物医学实体之间的新关系。然后,我们描述我们对端到端模型的后续改进。具体来说,我们通过将其合并到包括标记器和链接器模块的级联管道中来增强我们的初始系统。这种集成可以直接从原始文本中全面提取关系并对其新颖性进行分类。我们的实验取得了有希望的结果,我们的标记器模块设法获得了最先进的命名实体识别性能,微F1评分为90.24,而我们的端到端系统获得了24.59的竞争新颖性F1评分。运行我们系统的代码可在https://github.com/ieeta-pt/BioNExt上公开获得。数据库URL:https://github.com/ieeta-pt/BioNExt。
公众号