关键词: Data mining Disease attribute Domain adaptation Multi-task learning Natural language processing

Mesh : Humans Opioid-Related Disorders Machine Learning Semantics Natural Language Processing

来  源:   DOI:10.1186/s13326-024-00311-4   PDF(Pubmed)

Abstract:
BACKGROUND: The semantics of entities extracted from a clinical text can be dramatically altered by modifiers, including entity negation, uncertainty, conditionality, severity, and subject. Existing models for determining modifiers of clinical entities involve regular expression or features weights that are trained independently for each modifier.
METHODS: We develop and evaluate a multi-task transformer architecture design where modifiers are learned and predicted jointly using the publicly available SemEval 2015 Task 14 corpus and a new Opioid Use Disorder (OUD) data set that contains modifiers shared with SemEval as well as novel modifiers specific for OUD. We evaluate the effectiveness of our multi-task learning approach versus previously published systems and assess the feasibility of transfer learning for clinical entity modifiers when only a portion of clinical modifiers are shared.
RESULTS: Our approach achieved state-of-the-art results on the ShARe corpus from SemEval 2015 Task 14, showing an increase of 1.1% on weighted accuracy, 1.7% on unweighted accuracy, and 10% on micro F1 scores.
CONCLUSIONS: We show that learned weights from our shared model can be effectively transferred to a new partially matched data set, validating the use of transfer learning for clinical text modifiers.
摘要:
背景:从临床文本中提取的实体的语义可以通过修饰符显着改变,包括实体否定,不确定性,条件性,严重程度,和主题。用于确定临床实体的修饰符的现有模型涉及针对每个修饰符独立训练的正则表达式或特征权重。
方法:我们开发和评估了一种多任务变压器架构设计,其中使用公开的SemEval2015Task14语料库和新的阿片类药物使用障碍(OUD)数据集共同学习和预测修饰符,该数据集包含与Semval共享的修饰符以及针对OUD的新颖修饰符。我们评估了我们的多任务学习方法与以前发布的系统的有效性,并评估了当只有一部分临床修饰语共享时,临床实体修饰语迁移学习的可行性。
结果:我们的方法在SemEval2015Task14的ShARe语料库上取得了最先进的结果,加权精度提高了1.1%,未加权精度为1.7%,微F1得分为10%。
结论:我们表明,从我们的共享模型中学习到的权重可以有效地转移到新的部分匹配的数据集,验证迁移学习在临床文本修饰语中的应用。
公众号