Word2vec

Word2Vec
  • 文章类型: Journal Article
    计算机解题评价中的过程数据蕴含着丰富的有价值的隐含信息。然而,其多样化和不规则的结构对有效的特征提取提出了挑战,导致现有方法中不同程度的信息丢失。过程响应行为在关键单元和上下文关系方面与文本数据相似。尽管缺乏相关研究,探索文本分析方法对过程数据中的特征识别具有重要意义。这项研究调查了词频-逆文档频率(TF-IDF)和单词到矢量(Word2vec)在提取响应行为特征方面的功效,分析,以及经典机器学习方法(有监督和无监督)对响应行为的聚类效应。对PISA2012基于计算机的问题解决数据集的分析表明,TF-IDF有效地提取了关键响应行为,而Word2vec从排序的响应行为中捕获有效的特征。此外,在使用这两种方法的监督机器学习中,基于TF-IDF的随机森林模型表现最好,其次是基于Word2vec的SVM模型。基于Word2vec的模型在F1评分中优于基于TF-IDF的模型,准确度,和对逻辑回归的召回(精度除外),k-最近邻,和支持向量机算法。在无监督机器学习中,k-means算法有效地对这些方法提取的不同响应行为模式进行聚类。研究结果强调了这些文本分析方法在教育和心理评估环境中的理论和方法上的可转移性。这项研究通过产生丰富的特征表示,为类似领域的研究和实践提供了有价值的见解,补充细粒度的评估证据,促进个性化学习,并为教育评估引入新的见解。
    The process data in computer-based problem-solving evaluation is rich in valuable implicit information. However, its diverse and irregular structure poses challenges for effective feature extraction, leading to varying degrees of information loss in existing methods. Process-response behavior exhibits similarities to textual data in terms of the key units and contextual relationships. Despite the scarcity of relevant research, exploring text analysis methods for feature recognition in process data is significant. This study investigated the efficacy of Term Frequency-Inverse Document Frequency (TF-IDF) and Word to Vector (Word2vec) in extracting response behavior features and compared the predictive, analytical, and clustering effects of classical machine learning methods (supervised and unsupervised) on response behavior. An analysis of the PISA 2012 computer-based problem-solving dataset revealed that TF-IDF effectively extracted key response behaviors, whereas Word2vec captured effective features from sequenced response behaviors. In addition, in supervised machine learning using both methods, the random forest model based on TF-IDF performed the best, followed by the SVM model based on Word2vec. Word2vec-based models outperformed TF-IDF-based ones in the F1-score, accuracy, and recall (except for precision) across the logistic regression, k-nearest neighbor, and support vector machine algorithms. In unsupervised machine learning, the k-means algorithm effectively clustered different response behavior patterns extracted by these methods. The findings underscore the theoretical and methodological transferability of these text analysis methods in educational and psychological assessment contexts. This study offers valuable insights for research and practice in similar domains by yielding rich feature representations, supplementing fine-grained assessment evidence, fostering personalized learning, and introducing novel insights for educational assessment.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    奥斯古德,Suci,Tannebaum是第一个尝试使用根据人类对单词相关度的判断构建的高维语义模型的降维方法来识别语义的主要成分的人。现代单词嵌入模型分析单词的模式,以构建可以类似地进行降维的更高维语义模型。Hollis和Westbury通过将单词嵌入模型的前八个主成分(PC)与几种著名的词汇度量相关联来表征,例如记录的单词频率,习得年龄,价,唤醒,支配地位,和具体性。结果表明PC之间的解释有一些明显的区别。这里,我们通过使用最初从PC的主观检查中得出的语义度量来分析更大的单词嵌入矩阵来扩展这项工作。然后,我们使用定量分析来确认这些主观度量用于预测PC值的实用性,并在不同语料库上开发的两个单词嵌入矩阵上进行交叉验证。几种语义和词类度量对早期PC值具有很强的预测能力,包括第一人称和第二人称动词,抽象和具体词语的个人相关性,影响术语,地点和人的名字。最低量级PC的预测因子很好地推广到从单独的语料库构建的单词嵌入矩阵,包括使用不同的单词嵌入方法构造的矩阵。我们描述的预测类别与维特根斯坦的论点一致,维特根斯坦的论点是社会互动的自主水平以语言意义为基础。
    Osgood, Suci, and Tannebaum were the first to attempt to identify the principal components of semantics using dimensional reduction of a high-dimensional model of semantics constructed from human judgments of word relatedness. Modern word-embedding models analyze patterns of words to construct higher dimensional models of semantics that can be similarly subjected to dimensional reduction. Hollis and Westbury characterized the first eight principal components (PCs) of a word-embedding model by correlating them with several well-known lexical measures, such as logged word frequency, age of acquisition, valence, arousal, dominance, and concreteness. The results show some clear differentiation of interpretation between the PCs. Here, we extend this work by analyzing a larger word-embedding matrix using semantic measures initially derived from subjective inspection of the PCs. We then use quantitative analysis to confirm the utility of these subjective measures for predicting PC values and cross-validate them on two word-embedding matrices developed on distinct corpora. Several semantic and word class measures are strongly predictive of early PC values, including first-person and second-person verbs, personal relevance of abstract and concrete words, affect terms, and names of places and people. The predictors of the lowest magnitude PCs generalized well to word-embedding matrices constructed from separate corpora, including matrices constructed using different word-embedding methods. The predictive categories we describe are consistent with Wittgenstein\'s argument that an autonomous level of social interaction grounds linguistic meaning.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    高度相似的记忆痕迹的记忆区分在健康衰老中通过海马模式分离的变化而受到影响-即,海马体将高度相似的神经输入正交化的能力。这个过程的下降导致了情节特异性的丧失。因为以前的研究几乎只测试了视觉空间刺激的记忆区分(例如,对象或场景),关于年龄相关的影响对语义上相似痕迹的情节特异性的了解较少。为了解决这个差距,我们设计了一个任务来评估言语刺激的助记区分作为基于词嵌入的语义相似性的函数。40名年轻人(法师=21.7岁)和40名老年人(法师=69.8岁)首先附带编码形容词名词短语,然后进行了一个惊喜识别测试,涉及完全重复和高度相似的诱惑短语。我们发现,语义相似性的增加对两个年龄组的记忆区分都有负面影响,与年轻人相比,老年人在中等语义相似性水平上表现出更差的歧视。这些结果表明,语义上相似的记忆痕迹的情节特异性在衰老过程中受到效率较低的助记符操作的影响,并加强了这样的观念,即助记符区分是一种与模态无关的过程,支持跨表示域的记忆特异性。
    Mnemonic discrimination of highly similar memory traces is affected in healthy aging via changes in hippocampal pattern separation-i.e., the ability of the hippocampus to orthogonalize highly similar neural inputs. The decline of this process leads to a loss of episodic specificity. Because previous studies have almost exclusively tested mnemonic discrimination of visuospatial stimuli (e.g., objects or scenes), less is known about age-related effects on the episodic specificity of semantically similar traces. To address this gap, we designed a task to assess mnemonic discrimination of verbal stimuli as a function of semantic similarity based on word embeddings. Forty young (Mage = 21.7 years) and 40 old adults (Mage = 69.8 years) first incidentally encoded adjective-noun phrases, then performed a surprise recognition test involving exactly repeated and highly similar lure phrases. We found that increasing semantic similarity negatively affected mnemonic discrimination in both age groups, and that compared to young adults, older adults showed worse discrimination at medium levels of semantic similarity. These results indicate that episodic specificity of semantically similar memory traces is affected in aging via less efficient mnemonic operations and strengthen the notion that mnemonic discrimination is a modality-independent process supporting memory specificity across representational domains.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    患者门户信息通常涉及特定的临床现象(例如,正在接受乳腺癌治疗的患者)和,因此,越来越受到生物医学研究的重视。这些消息需要自然语言处理,而单词嵌入模型,如word2vec,有可能从文本中提取有意义的信号,它们不适用于患者门户消息。这是因为嵌入模型通常需要数百万个训练样本来充分表示语义,而与特定临床现象相关的患者入口信息的量通常相对较小。我们介绍了一种对word2vec模型的新颖改编,PK-word2vec(其中PK代表先验知识),用于小规模的消息。PK-word2vec包含了医学词汇最相似的术语(包括问题,治疗,和测试)以及来自两个预训练嵌入模型的非医学单词作为先验知识,以改善训练过程。我们将PK-word2vec应用于2004年12月至2017年11月在范德比尔特大学医学中心电子健康记录系统中发送的患者门户消息的案例研究。我们通过一组1000个任务来评估模型,每个单词的相关性与一组由PK-word2vec生成的五个最相似的单词和一组由标准word2vec模型生成的五个最相似的单词进行比较。我们招募了200名亚马逊土耳其机械(AMT)工人和7名医学生来执行任务。该数据集由1389个患者记录组成,包括137,554条消息和10,683个独特单词。已有7981个非医学单词和1116个医学单词的先验知识。在90%以上的任务中,两位审稿人均表示,PK-word2vec比标准word2vec生成的相似词更多(p=0.01).对于两组审阅者之间的任务选择的所有比较,AMT工作者与医学生的评估差异都可以忽略不计(配对t检验下p=0.774)。PK-word2vec可以从小型消息语料库中有效地学习单词表示,标志着在处理患者门户消息方面的显著进步。
    Patient portal messages often relate to specific clinical phenomena (e.g., patients undergoing treatment for breast cancer) and, as a result, have received increasing attention in biomedical research. These messages require natural language processing and, while word embedding models, such as word2vec, have the potential to extract meaningful signals from text, they are not readily applicable to patient portal messages. This is because embedding models typically require millions of training samples to sufficiently represent semantics, while the volume of patient portal messages associated with a particular clinical phenomenon is often relatively small. We introduce a novel adaptation of the word2vec model, PK-word2vec (where PK stands for prior knowledge), for small-scale messages. PK-word2vec incorporates the most similar terms for medical words (including problems, treatments, and tests) and non-medical words from two pre-trained embedding models as prior knowledge to improve the training process. We applied PK-word2vec in a case study of patient portal messages in the Vanderbilt University Medical Center electric health record system sent by patients diagnosed with breast cancer from December 2004 to November 2017. We evaluated the model through a set of 1000 tasks, each of which compared the relevance of a given word to a group of the five most similar words generated by PK-word2vec and a group of the five most similar words generated by the standard word2vec model. We recruited 200 Amazon Mechanical Turk (AMT) workers and 7 medical students to perform the tasks. The dataset was composed of 1389 patient records and included 137,554 messages with 10,683 unique words. Prior knowledge was available for 7981 non-medical and 1116 medical words. In over 90% of the tasks, both reviewers indicated PK-word2vec generated more similar words than standard word2vec (p = 0.01).The difference in the evaluation by AMT workers versus medical students was negligible for all comparisons of tasks\' choices between the two groups of reviewers ( p = 0.774 under a paired t-test). PK-word2vec can effectively learn word representations from a small message corpus, marking a significant advancement in processing patient portal messages.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    广泛的MHC-肽结合数据的可用性已经促进了用于预测结合亲和力和识别结合基序的基于机器学习的方法。这些计算工具利用大量的结合数据来提取基本特征并生成大量潜在的肽,从而大大降低了实验程序所需的成本和时间。MAM是预测MHC-I肽结合亲和力的一种工具,提取结合基序,并产生具有高亲和力的新肽。本手稿提供了有关安装的分步指导,配置,并执行MAM,同时讨论使用此工具时的最佳实践。
    The availability of extensive MHC-peptide binding data has boosted machine learning-based approaches for predicting binding affinity and identifying binding motifs. These computational tools leverage the wealth of binding data to extract essential features and generate a multitude of potential peptides, thereby significantly reducing the cost and time required for experimental procedures. MAM is one such tool for predicting the MHC-I-peptide binding affinity, extracting binding motifs, and generating new peptides with high affinity. This manuscript provides step-by-step guidance on installing, configuring, and executing MAM while also discussing the best practices when using this tool.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:近几十年来,人工智能(AI)技术在生物医学领域的利用引起了越来越多的关注。研究过去的人工智能技术是如何随着时间的推移进入医学的,可以帮助预测未来几年哪些当前(和未来)的人工智能技术有潜力用于医学。从而为今后的研究方向提供有益的参考。
    目的:本研究的目的是根据相关技术和生物医学领域的过去趋势,预测AI技术在不同生物医学领域使用的未来趋势。
    方法:我们从PubMed数据库中收集了大量与人工智能和生物医学交叉相关的文章。最初,我们试图单独对提取的关键字使用回归;然而,我们发现这种方法没有提供足够的信息。因此,我们提出了一种称为“背景增强预测”的方法,通过合并关键字及其周围上下文来扩展回归算法所利用的知识。这种数据构建方法提高了评估的六个回归模型的性能。我们的发现通过循环预测和预测实验得到了证实。
    结果:在我们使用背景信息进行预测的分析中,我们发现窗口大小为3会产生最好的结果,优于单独使用关键字。此外,仅利用2017年之前的数据,我们对2017-2021年期间的回归预测显示出很高的决定系数(R2),达到0.78,证明了我们的方法在预测长期趋势方面的有效性。根据预测,与蛋白质和肿瘤相关的研究将被推出前20名,并被早期诊断所取代,断层摄影术,和其他检测技术。这些是非常适合纳入AI技术的某些领域。深度学习,机器学习,神经网络仍然是生物医学应用中占主导地位的人工智能技术。生成对抗网络代表了一种具有强劲增长趋势的新兴技术。
    结论:在这项研究中,我们探索了生物医学领域的人工智能趋势,并开发了预测模型来预测未来趋势。我们的发现通过对当前趋势的实验得到了证实。
    BACKGROUND: The utilization of artificial intelligence (AI) technologies in the biomedical field has attracted increasing attention in recent decades. Studying how past AI technologies have found their way into medicine over time can help to predict which current (and future) AI technologies have the potential to be utilized in medicine in the coming years, thereby providing a helpful reference for future research directions.
    OBJECTIVE: The aim of this study was to predict the future trend of AI technologies used in different biomedical domains based on past trends of related technologies and biomedical domains.
    METHODS: We collected a large corpus of articles from the PubMed database pertaining to the intersection of AI and biomedicine. Initially, we attempted to use regression on the extracted keywords alone; however, we found that this approach did not provide sufficient information. Therefore, we propose a method called \"background-enhanced prediction\" to expand the knowledge utilized by the regression algorithm by incorporating both the keywords and their surrounding context. This method of data construction resulted in improved performance across the six regression models evaluated. Our findings were confirmed through experiments on recurrent prediction and forecasting.
    RESULTS: In our analysis using background information for prediction, we found that a window size of 3 yielded the best results, outperforming the use of keywords alone. Furthermore, utilizing data only prior to 2017, our regression projections for the period of 2017-2021 exhibited a high coefficient of determination (R2), which reached up to 0.78, demonstrating the effectiveness of our method in predicting long-term trends. Based on the prediction, studies related to proteins and tumors will be pushed out of the top 20 and become replaced by early diagnostics, tomography, and other detection technologies. These are certain areas that are well-suited to incorporate AI technology. Deep learning, machine learning, and neural networks continue to be the dominant AI technologies in biomedical applications. Generative adversarial networks represent an emerging technology with a strong growth trend.
    CONCLUSIONS: In this study, we explored AI trends in the biomedical field and developed a predictive model to forecast future trends. Our findings were confirmed through experiments on current trends.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    情感分析旨在根据情境中表达的观点或心态对文本进行分类,这可能是积极的,负,或中立。因此,在世界上,各种社交媒体网站上有很多观点,必须收集和分析这些数据,以评估公众的意见。查找和监控评论,以及手动提取其中包含的信息,由于YouTube上想法的多样性,这是一项艰巨的任务。识别战争主题的公众舆论对于根据对正在进行的战争的流行观点和情绪向对立双方提供见解至关重要。为了弥补差距,我们建立了一个基于YouTube评论情绪分析的哈马斯-以色列战争判断民意模型。在这项研究中,我们通过开发一种基于深度学习的情感分析方法来解决这些差距。我们从包括BBC在内的YouTube新闻频道收集了24,360条评论,WION,半岛电视台,以及其他有关哈马斯-以色列战争的人使用YouTubeAPI和Google电子表格,并由语言专家将其标记为三类:积极,负,中立。然后,使用自然语言处理(NLP)技术对文本注释进行预处理,并使用Word2vec提取特征,FastText,还有GloVe.此外,我们使用了SMOTE数据平衡技术,并使用了不同的数据分割,但是80/20列车测试分流比具有最高的准确性。对于分类模型构建,常用的分类算法LSTM,Bi-LSTM,GRU,并应用了CNN和Bi-LSTM的混合体,并对其性能进行了比较。因此,CNN和Bi-LSTM与Word2vec的混合实现了最高的性能,对评论分类的准确率为95.73%。
    Sentiment analysis aims to classify text based on the opinion or mentality expressed in a situation, which can be positive, negative, or neutral. Therefore, in the world, a lot of opinions are available on various social media sites, which must be gathered and analyzed to assess the general public\'s opinion. Finding and monitoring comments, as well as manually extracting the information contained in them, is a difficult task due to the vast diversity of ideas on YouTube. Identifying public opinion on war topics is crucial for offering insights to opposing sides based on popular opinion and emotions about the ongoing war. To address the gap, we build a model on YouTube comment sentiment analysis of the Hamas-Israel war to determine public opinion. In this study, we address the gaps by developing a deep learning-based approach for sentiment analysis. We have collected 24,360 comments from popular YouTube News Channels including BBC, WION, Aljazeera, and others about the Hamas-Israel War using YouTube API and Google spreadsheet and labeled them by linguistic experts into three classes: positive, negative, and neutral. Then, textual comments were preprocessed using natural language processing (NLP) techniques, and features were extracted using Word2vec, FastText, and GloVe. Moreover, we have used the SMOTE data balancing technique and used different data splits, but the 80/20 train-test split ratio has the highest accuracy. For classification model building, commonly used classification algorithms LSTM, Bi-LSTM, GRU, and Hybrid of CNN and Bi-LSTM were applied, and their performance is compared. As a result, the Hybrid of CNN and Bi-LSTM with Word2vec achieved the highest performance with 95.73% accuracy for comments classifications.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    识别化合物和蛋白质之间的相互作用对于各种应用至关重要,包括药物发现,目标识别,网络药理学,和蛋白质功能的阐明。基于深度神经网络的方法在高效识别具有高通量能力的化合物-蛋白质相互作用方面变得越来越流行,缩小传统劳动密集型候选人的范围,耗时且昂贵的实验技术。在这项研究中,我们提出了一种名为SPVec-SGCN-CPI的端到端方法,它利用简化的图卷积网络(SGCN)模型,该模型具有从我们先前开发的模型SPVec和图拓扑信息生成的低维和连续特征来预测化合物-蛋白质相互作用。SGCN技术,划分局部邻域聚合和非线性逐层传播步骤,有效地聚合K阶邻居信息,同时避免邻居爆炸和加速训练。SPVec-SGCN-CPI方法的性能在三个数据集进行了评估,并与四种基于机器学习和深度学习的方法进行了比较。以及六种最先进的方法。实验结果表明,SPVec-SGCN-CPI优于所有这些竞争方法,在不平衡的数据场景中尤其出色。通过将节点特征和拓扑信息传播到特征空间,SPVec-SGCN-CPI有效地结合了化合物和蛋白质之间的相互作用,实现异质性的融合。此外,我们的方法对ChEMBL中所有未标记的数据进行评分,通过分子对接和现有证据确认前五名的化合物-蛋白质相互作用。这些发现表明,我们的模型可以可靠地揭示未标记的化合物-蛋白质对中的化合物-蛋白质相互作用,对药物重新分析和发现具有重要意义。总之,SPVec-SGCN证明了其在准确预测化合物-蛋白质相互作用方面的功效,展示了增强靶标识别和简化药物发现过程的潜力。科学贡献在这项工作中提出的方法不仅能够比较准确地预测化合物-蛋白质相互作用,而且,第一次,同时考虑现实世界中非常常见的样本不平衡和计算效率,加速目标识别和药物发现过程。
    Identification of interactions between chemical compounds and proteins is crucial for various applications, including drug discovery, target identification, network pharmacology, and elucidation of protein functions. Deep neural network-based approaches are becoming increasingly popular in efficiently identifying compound-protein interactions with high-throughput capabilities, narrowing down the scope of candidates for traditional labor-intensive, time-consuming and expensive experimental techniques. In this study, we proposed an end-to-end approach termed SPVec-SGCN-CPI, which utilized simplified graph convolutional network (SGCN) model with low-dimensional and continuous features generated from our previously developed model SPVec and graph topology information to predict compound-protein interactions. The SGCN technique, dividing the local neighborhood aggregation and nonlinearity layer-wise propagation steps, effectively aggregates K-order neighbor information while avoiding neighbor explosion and expediting training. The performance of the SPVec-SGCN-CPI method was assessed across three datasets and compared against four machine learning- and deep learning-based methods, as well as six state-of-the-art methods. Experimental results revealed that SPVec-SGCN-CPI outperformed all these competing methods, particularly excelling in unbalanced data scenarios. By propagating node features and topological information to the feature space, SPVec-SGCN-CPI effectively incorporates interactions between compounds and proteins, enabling the fusion of heterogeneity. Furthermore, our method scored all unlabeled data in ChEMBL, confirming the top five ranked compound-protein interactions through molecular docking and existing evidence. These findings suggest that our model can reliably uncover compound-protein interactions within unlabeled compound-protein pairs, carrying substantial implications for drug re-profiling and discovery. In summary, SPVec-SGCN demonstrates its efficacy in accurately predicting compound-protein interactions, showcasing potential to enhance target identification and streamline drug discovery processes.Scientific contributionsThe methodology presented in this work not only enables the comparatively accurate prediction of compound-protein interactions but also, for the first time, take sample imbalance which is very common in real world and computation efficiency into consideration simultaneously, accelerating the target identification and drug discovery process.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景患者门户信息通常涉及特定的临床现象(例如,正在接受乳腺癌治疗的患者)和,因此,越来越受到生物医学研究的重视。这些消息需要自然语言处理,而单词嵌入模型,如word2vec,有可能从文本中提取有意义的信号,它们不适用于患者门户消息。这是因为嵌入模型通常需要数百万个训练样本来充分表示语义,而与特定临床现象相关的患者入口信息的量通常相对较小。目的我们介绍一种新颖的word2vec模型改编,PK-word2vec,用于小规模的消息。方法PK-word2vec结合了医学词汇最相似的术语(包括问题,治疗,和测试)以及来自两个预训练嵌入模型的非医学单词作为先验知识,以改善训练过程。我们将PK-word2vec应用于2004年12月至2017年11月在范德比尔特大学医学中心电子健康记录系统中由诊断为乳腺癌的患者发送的患者门户消息。我们通过一组1000个任务来评估模型,每个单词的相关性与一组由PK-word2vec生成的五个最相似的单词和一组由标准word2vec模型生成的五个最相似的单词进行比较。我们招募了200名亚马逊土耳其机械(AMT)工人和7名医学生来执行任务。结果数据集由1389份患者记录组成,包括137,554条消息和10,683个独特单词。有7,981个非医学单词和1,116个医学单词的先验知识。在90%以上的任务中,两位审稿人均表示,PK-word2vec比标准word2vec生成的相似词更多(p=0.01).对于两组审阅者之间的任务选择的所有比较,AMT工作者与医学生的评估差异都可以忽略不计(配对t检验下p=0.774)。Conclusions.PK-word2vec可以从小型消息语料库中有效地学习单词表示,标志着在处理患者门户消息方面的显著进步。
    UNASSIGNED: Patient portal messages often relate to specific clinical phenomena (e.g., patients undergoing treatment for breast cancer) and, as a result, have received increasing attention in biomedical research. These messages require natural language processing and, while word embedding models, such as word2vec, have the potential to extract meaningful signals from text, they are not readily applicable to patient portal messages. This is because embedding models typically require millions of training samples to sufficiently represent semantics, while the volume of patient portal messages associated with a particular clinical phenomenon is often relatively small.
    UNASSIGNED: We introduce a novel adaptation of the word2vec model, PK-word2vec, for small-scale messages.
    UNASSIGNED: PK-word2vec incorporates the most similar terms for medical words (including problems, treatments, and tests) and non-medical words from two pre-trained embedding models as prior knowledge to improve the training process. We applied PK-word2vec on patient portal messages in the Vanderbilt University Medical Center electric health record system sent by patients diagnosed with breast cancer from December 2004 to November 2017. We evaluated the model through a set of 1000 tasks, each of which compared the relevance of a given word to a group of the five most similar words generated by PK-word2vec and a group of the five most similar words generated by the standard word2vec model. We recruited 200 Amazon Mechanical Turk (AMT) workers and 7 medical students to perform the tasks.
    UNASSIGNED: The dataset was composed of 1,389 patient records and included 137,554 messages with 10,683 unique words. Prior knowledge was available for 7,981 non-medical and 1,116 medical words. In over 90% of the tasks, both reviewers indicated PK-word2vec generated more similar words than standard word2vec (p=0.01).The difference in the evaluation by AMT workers versus medical students was negligible for all comparisons of tasks\' choices between the two groups of reviewers (p = 0.774 under a paired t-test).
    UNASSIGNED: PK-word2vec can effectively learn word representations from a small message corpus, marking a significant advancement in processing patient portal messages.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    汽车电子植入系统的复杂性和关键性正在稳步推进,尤其是汽车软件开发。ISO26262描述了开发过程的要求,以确认此类复杂系统的安全性。在这些要求中,故障注入是一种可靠的技术,可以评估安全机制的有效性,并验证安全要求的正确实施。然而,在许多情况下,在测试系统中注入故障的方法仍然是手动的,并且取决于专家,需要高水平的系统知识。在复杂系统中,它消耗时间,很难执行,并付出努力,因为测试人员限制了故障注入实验,并注入了最小数量的可能测试用例。故障注入使测试人员能够在测试中的系统成为实际问题之前识别和解决潜在问题。在汽车行业,失败可能有严重的危险。在这些系统中,确保系统即使在出现故障的情况下也能安全运行是至关重要的。我们提出了一种使用自然语言处理(NLP)技术从功能安全要求(FSR)中自动导出故障测试用例的方法,并根据黑匣子概念和ISO26262标准通过硬件在环(HIL)实时自动执行它们。该方法证明了自动识别故障注入位置和条件的有效性,简化测试过程,并为各种安全关键系统提供可扩展的解决方案。
    The complexity and the criticality of automotive electronic implanted systems are steadily advancing and that is especially the case for automotive software development. ISO 26262 describes requirements for the development process to confirm the safety of such complex systems. Among these requirements, fault injection is a reliable technique to assess the effectiveness of safety mechanisms and verify the correct implementation of the safety requirements. However, the method of injecting the fault in the system under test in many cases is still manual and depends on an expert, requiring a high level of knowledge of the system. In complex systems, it consumes time, is difficult to execute, and takes effort, because the testers limit the fault injection experiments and inject the minimum number of possible test cases. Fault injection enables testers to identify and address potential issues with a system under test before they become actual problems. In the automotive industry, failures can have serious hazards. In these systems, it is essential to ensure that the system can operate safely even in the presence of faults. We propose an approach using natural language processing (NLP) technologies to automatically derive the fault test cases from the functional safety requirements (FSRs) and execute them automatically by hardware-in-the-loop (HIL) in real time according to the black-box concept and the ISO 26262 standard. The approach demonstrates effectiveness in automatically identifying fault injection locations and conditions, simplifying the testing process, and providing a scalable solution for various safety-critical systems.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号