使用单词进化来预测药物的再利用。Using word evolution to predict drug repurposing.-医云文献数字医云科研云海量医学决策数据服务

Abstract：

BACKGROUND: Traditional literature based discovery is based on connecting knowledge pairs extracted from separate publications via a common mid point to derive previously unseen knowledge pairs. To avoid the over generation often associated with this approach, we explore an alternative method based on word evolution. Word evolution examines the changing contexts of a word to identify changes in its meaning or associations. We investigate the possibility of using changing word contexts to detect drugs suitable for repurposing.
RESULTS: Word embeddings, which represent a word\'s context, are constructed from chronologically ordered publications in MEDLINE at bi-monthly intervals, yielding a time series of word embeddings for each word. Focusing on clinical drugs only, any drugs repurposed in the final time segment of the time series are annotated as positive examples. The decision regarding the drug\'s repurposing is based either on the Unified Medical Language System (UMLS), or semantic triples extracted using SemRep from MEDLINE.
CONCLUSIONS: The annotated data allows deep learning classification, with a 5-fold cross validation, to be performed and multiple architectures to be explored. Performance of 65% using UMLS labels, and 81% using SemRep labels is attained, indicating the technique\'s suitability for the detection of candidate drugs for repurposing. The investigation also shows that different architectures are linked to the quantities of training data available and therefore that different models should be trained for every annotation approach.

摘要：

背景：基于传统文献的发现是基于通过公共中点将从单独出版物中提取的知识对连接起来，以得出以前看不见的知识对。为了避免经常与这种方法相关的过度生成，我们探索了一种基于单词进化的替代方法。单词进化检查单词的变化上下文，以识别其含义或关联的变化。我们研究了使用变化的单词上下文来检测适合重新利用的药物的可能性。
结果：词嵌入，代表单词的上下文，是由MEDLINE中按时间顺序排列的出版物以每两个月为间隔构建的，为每个单词生成一个单词嵌入的时间序列。只专注于临床药物，在时间序列的最后时间段中再利用的任何药物都被注释为积极的例子。关于药物再利用的决定是基于统一医疗语言系统(UMLS)，或使用MEDLINE中的SemRep提取的语义三元组。
结论：注释数据允许深度学习分类，通过5倍交叉验证，要执行和多种架构要探索。使用UMLS标签的性能为65%，81%使用SemRep标签，表明该技术适用于检测用于再利用的候选药物。调查还表明，不同的体系结构与可用的训练数据量相关联，因此每种注释方法都应训练不同的模型。