关键词: Alzheimer ChEBI Dementia Entity normalization Ontology

Mesh : Alzheimer Disease / drug therapy metabolism Humans Dementia Translational Research, Biomedical Natural Language Processing Biological Ontologies

来  源:   DOI:10.1186/s13326-024-00314-1   PDF(Pubmed)

Abstract:
BACKGROUND: Identifying chemical mentions within the Alzheimer\'s and dementia literature can provide a powerful tool to further therapeutic research. Leveraging the Chemical Entities of Biological Interest (ChEBI) ontology, which is rich in hierarchical and other relationship types, for entity normalization can provide an advantage for future downstream applications. We provide a reproducible hybrid approach that combines an ontology-enhanced PubMedBERT model for disambiguation with a dictionary-based method for candidate selection.
RESULTS: There were 56,553 chemical mentions in the titles of 44,812 unique PubMed article abstracts. Based on our gold standard, our method of disambiguation improved entity normalization by 25.3 percentage points compared to using only the dictionary-based approach with fuzzy-string matching for disambiguation. For the CRAFT corpus, our method outperformed baselines (maximum 78.4%) with a 91.17% accuracy. For our Alzheimer\'s and dementia cohort, we were able to add 47.1% more potential mappings between MeSH and ChEBI when compared to BioPortal.
CONCLUSIONS: Use of natural language models like PubMedBERT and resources such as ChEBI and PubChem provide a beneficial way to link entity mentions to ontology terms, while further supporting downstream tasks like filtering ChEBI mentions based on roles and assertions to find beneficial therapies for Alzheimer\'s and dementia.
摘要:
背景:确定阿尔茨海默病和痴呆文献中的化学提及可以为进一步的治疗研究提供有力的工具。利用生物兴趣化学实体(ChEBI)本体论,它富含分层和其他关系类型,for实体规范化可以为未来的下游应用程序提供优势。我们提供了一种可重复的混合方法,该方法将用于消歧的本体增强的PubMedBERT模型与基于字典的候选选择方法相结合。
结果:在44,812篇独特的PubMed文章摘要的标题中有56,553种化学物质。基于我们的黄金标准,与仅使用基于字典的模糊字符串匹配的消歧方法相比,我们的消歧方法将实体归一化提高了25.3个百分点。对于CRAFT语料库,我们的方法优于基线(最大78.4%),准确率为91.17%。对于我们的阿尔茨海默氏症和痴呆症队列,与BioPortal相比,我们能够在MeSH和ChEBI之间增加47.1%的潜在映射.
结论:使用像PubMedBERT这样的自然语言模型和像ChEBI和PubChem这样的资源提供了一种将实体提及链接到本体术语的有益方式,同时进一步支持下游任务,如基于角色和断言过滤ChEBI提及,以找到对阿尔茨海默氏症和痴呆症的有益疗法。
公众号