RESULTS: Word embeddings, which represent a word\'s context, are constructed from chronologically ordered publications in MEDLINE at bi-monthly intervals, yielding a time series of word embeddings for each word. Focusing on clinical drugs only, any drugs repurposed in the final time segment of the time series are annotated as positive examples. The decision regarding the drug\'s repurposing is based either on the Unified Medical Language System (UMLS), or semantic triples extracted using SemRep from MEDLINE.
CONCLUSIONS: The annotated data allows deep learning classification, with a 5-fold cross validation, to be performed and multiple architectures to be explored. Performance of 65% using UMLS labels, and 81% using SemRep labels is attained, indicating the technique\'s suitability for the detection of candidate drugs for repurposing. The investigation also shows that different architectures are linked to the quantities of training data available and therefore that different models should be trained for every annotation approach.
结果:词嵌入,代表单词的上下文,是由MEDLINE中按时间顺序排列的出版物以每两个月为间隔构建的,为每个单词生成一个单词嵌入的时间序列。只专注于临床药物,在时间序列的最后时间段中再利用的任何药物都被注释为积极的例子。关于药物再利用的决定是基于统一医疗语言系统(UMLS),或使用MEDLINE中的SemRep提取的语义三元组。
结论:注释数据允许深度学习分类,通过5倍交叉验证,要执行和多种架构要探索。使用UMLS标签的性能为65%,81%使用SemRep标签,表明该技术适用于检测用于再利用的候选药物。调查还表明,不同的体系结构与可用的训练数据量相关联,因此每种注释方法都应训练不同的模型。