NLP

NLP
  • 文章类型: Journal Article
    背景:自杀是全球死亡的主要原因。新闻报道准则旨在遏制不安全报道的影响;然而,在新闻报道中自杀的框架可能因情况和死者的性别等重要特征而有所不同。
    目的:本研究旨在研究新闻媒体对自杀报道使用污名化或荣耀化的语言进行陷害的程度,以及性别和自杀情况在这种陷害方面的差异。
    方法:我们分析了200篇有关自杀的新闻文章,并应用经过验证的自杀污名量表来识别污名化和荣耀化的语言。我们用2个广泛使用的指标来评估语言相似性,余弦相似性和互信息得分,使用基于机器学习的大型语言模型。
    结果:男性自杀的新闻报道比女性自杀的报道更类似于污名化(P<.001)和美化(P=.005)语言。考虑到自杀的情况,互信息得分表明,在使用污名化或美化语言的性别差异最明显的文章归因于法律(0.155),关系(0.268),或心理健康问题(0.251)为原因。
    结论:语言差异,按性别,在报告自杀时使用污名化或美化语言可能会加剧自杀差异。
    BACKGROUND: Suicide is a leading cause of death worldwide. Journalistic reporting guidelines were created to curb the impact of unsafe reporting; however, how suicide is framed in news reports may differ by important characteristics such as the circumstances and the decedent\'s gender.
    OBJECTIVE: This study aimed to examine the degree to which news media reports of suicides are framed using stigmatized or glorified language and differences in such framing by gender and circumstance of suicide.
    METHODS: We analyzed 200 news articles regarding suicides and applied the validated Stigma of Suicide Scale to identify stigmatized and glorified language. We assessed linguistic similarity with 2 widely used metrics, cosine similarity and mutual information scores, using a machine learning-based large language model.
    RESULTS: News reports of male suicides were framed more similarly to stigmatizing (P<.001) and glorifying (P=.005) language than reports of female suicides. Considering the circumstances of suicide, mutual information scores indicated that differences in the use of stigmatizing or glorifying language by gender were most pronounced for articles attributing legal (0.155), relationship (0.268), or mental health problems (0.251) as the cause.
    CONCLUSIONS: Linguistic differences, by gender, in stigmatizing or glorifying language when reporting suicide may exacerbate suicide disparities.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    大型语言模型(LLM)支持的服务由于在许多任务中的出色性能而在各种应用程序中越来越受欢迎,如情绪分析和回答问题。最近,研究一直在探索它们在数字健康环境中的潜在用途,特别是在心理健康领域。然而,实施LLM增强的会话人工智能(CAI)提出了重要的道德,技术,和临床挑战。在这篇观点论文中,我们讨论了2个挑战,这些挑战会影响LLM增强的CAI对于有心理健康问题的个人的使用,专注于抑郁症患者的用例:将LLM增强的CAI人性化的趋势以及他们缺乏情境化的鲁棒性。我们的方法是跨学科的,依靠哲学的考虑,心理学,和计算机科学。我们认为,LLM增强的CAI的人性化取决于对使用LLM模拟“类似人类”特征的含义的反映,以及这些系统在与人类的互动中应该扮演什么角色。Further,确保LLM稳健性的情境化需要考虑抑郁症患者语言产生的特殊性,以及它随时间的演变。最后,我们提供了一系列建议,以促进负责任的设计和部署LLM增强的CAI,为抑郁症患者提供治疗支持.
    UNASSIGNED: Large language model (LLM)-powered services are gaining popularity in various applications due to their exceptional performance in many tasks, such as sentiment analysis and answering questions. Recently, research has been exploring their potential use in digital health contexts, particularly in the mental health domain. However, implementing LLM-enhanced conversational artificial intelligence (CAI) presents significant ethical, technical, and clinical challenges. In this viewpoint paper, we discuss 2 challenges that affect the use of LLM-enhanced CAI for individuals with mental health issues, focusing on the use case of patients with depression: the tendency to humanize LLM-enhanced CAI and their lack of contextualized robustness. Our approach is interdisciplinary, relying on considerations from philosophy, psychology, and computer science. We argue that the humanization of LLM-enhanced CAI hinges on the reflection of what it means to simulate \"human-like\" features with LLMs and what role these systems should play in interactions with humans. Further, ensuring the contextualization of the robustness of LLMs requires considering the specificities of language production in individuals with depression, as well as its evolution over time. Finally, we provide a series of recommendations to foster the responsible design and deployment of LLM-enhanced CAI for the therapeutic support of individuals with depression.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Editorial
    生成人工智能(AI)模型ChatGPT在医学中具有变革性的前景。这种模型的发展标志着一个新时代的开始,在这个时代,复杂的生物数据可以更容易获得和解释。ChatGPT是一种自然语言处理工具,可以处理,解释,并总结大量数据集。它可以作为医生和研究人员的数字助理,帮助将医学成像数据与其他多组学数据集成,并促进对复杂生物系统的理解。医生和人工智能的观点强调了这种人工智能模型在医学中的价值,提供具体的例子,说明这如何提高病人的护理。社论还讨论了生成AI的兴起,强调其在现代医学人工智能应用民主化方面的重大影响。虽然人工智能可能不会取代医疗保健专业人员,将人工智能纳入他们的实践的从业者可能会有竞争优势。
    The generative artificial intelligence (AI) model ChatGPT holds transformative prospects in medicine. The development of such models has signaled the beginning of a new era where complex biological data can be made more accessible and interpretable. ChatGPT is a natural language processing tool that can process, interpret, and summarize vast data sets. It can serve as a digital assistant for physicians and researchers, aiding in integrating medical imaging data with other multiomics data and facilitating the understanding of complex biological systems. The physician\'s and AI\'s viewpoints emphasize the value of such AI models in medicine, providing tangible examples of how this could enhance patient care. The editorial also discusses the rise of generative AI, highlighting its substantial impact in democratizing AI applications for modern medicine. While AI may not supersede health care professionals, practitioners incorporating AI into their practices could potentially have a competitive edge.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:人工智能,特别是聊天机器人系统,正在成为医疗保健的工具,帮助临床决策和患者参与。
    目的:本研究旨在分析ChatGPT-3.5和ChatGPT-4在解决复杂的临床和伦理困境方面的表现,并说明他们在医疗保健决策中的潜在作用,同时比较老年人和居民的评级,和特定的问题类型。
    方法:共有4名专业医师提出了176个现实世界的临床问题。共有8位资深医生和居民以1-5的量表评估了GPT-3.5和GPT-4的5个类别的回答:准确性,相关性,清晰度,实用程序,和全面性。在内科进行评估,急诊医学,和道德。在全球范围内进行了比较,在老年人和居民之间,跨分类。
    结果:两种GPT模型均获得较高的平均得分(GPT-4为4.4,SD0.8,GPT-3.5为4.1,SD1.0)。GPT-4在所有评级维度上都优于GPT-3.5,老年人对这两种模式的反应始终高于居民。具体来说,老年人将GPT-4评为更有益和更完整(分别为4.6vs4.0和4.6vs4.1;P<.001),和GPT-3.5相似(分别为4.1vs3.7和3.9vs3.5;P<.001)。道德查询在这两种模型中都获得了最高的评价,平均分数反映了准确性和完整性标准的一致性。问题类型之间的区别是显著的,特别是对于整个紧急情况下的GPT-4完整性平均分数,内部,和伦理问题(分别为4.2,SD1.0;4.3,SD0.8;和4.5,SD0.7;P<.001),对于GPT-3.5的准确性,有益的,和完整性尺寸。
    结论:ChatGPT帮助医生解决医疗问题的潜力是有希望的,具有增强诊断能力的前景,治疗,和道德。虽然整合到临床工作流程可能很有价值,它必须补充,不替换,人类的专业知识。持续的研究对于确保在临床环境中安全有效的实施至关重要。
    BACKGROUND: Artificial intelligence, particularly chatbot systems, is becoming an instrumental tool in health care, aiding clinical decision-making and patient engagement.
    OBJECTIVE: This study aims to analyze the performance of ChatGPT-3.5 and ChatGPT-4 in addressing complex clinical and ethical dilemmas, and to illustrate their potential role in health care decision-making while comparing seniors\' and residents\' ratings, and specific question types.
    METHODS: A total of 4 specialized physicians formulated 176 real-world clinical questions. A total of 8 senior physicians and residents assessed responses from GPT-3.5 and GPT-4 on a 1-5 scale across 5 categories: accuracy, relevance, clarity, utility, and comprehensiveness. Evaluations were conducted within internal medicine, emergency medicine, and ethics. Comparisons were made globally, between seniors and residents, and across classifications.
    RESULTS: Both GPT models received high mean scores (4.4, SD 0.8 for GPT-4 and 4.1, SD 1.0 for GPT-3.5). GPT-4 outperformed GPT-3.5 across all rating dimensions, with seniors consistently rating responses higher than residents for both models. Specifically, seniors rated GPT-4 as more beneficial and complete (mean 4.6 vs 4.0 and 4.6 vs 4.1, respectively; P<.001), and GPT-3.5 similarly (mean 4.1 vs 3.7 and 3.9 vs 3.5, respectively; P<.001). Ethical queries received the highest ratings for both models, with mean scores reflecting consistency across accuracy and completeness criteria. Distinctions among question types were significant, particularly for the GPT-4 mean scores in completeness across emergency, internal, and ethical questions (4.2, SD 1.0; 4.3, SD 0.8; and 4.5, SD 0.7, respectively; P<.001), and for GPT-3.5\'s accuracy, beneficial, and completeness dimensions.
    CONCLUSIONS: ChatGPT\'s potential to assist physicians with medical issues is promising, with prospects to enhance diagnostics, treatments, and ethics. While integration into clinical workflows may be valuable, it must complement, not replace, human expertise. Continued research is essential to ensure safe and effective implementation in clinical environments.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    牛皮癣是一种免疫介导的皮肤病,影响全球约3%的人口。这种情况的正确管理需要评估体表面积(BSA)以及指甲和关节的参与。最近,自然语言处理(NLP)与电子医疗记录(EMR)的集成在推进疾病分类和研究方面显示出了希望。这项研究评估了商业AI平台ChatGPT-4的性能,在分析银屑病患者的非结构化EMR数据时,特别是在识别受影响的身体区域。
    Psoriasis is an immune-mediated skin disease affecting approximately 3% of the global population. Proper management of this condition necessitates the assessment of the Body Surface Area (BSA) and the involvement of nails and joints. Recently, the integration of Natural Language Processing (NLP) with Electronic Medical Records (EMRs) has shown promise in advancing disease classification and research. This study evaluates the performance of ChatGPT-4, a commercial AI platform, in analyzing unstructured EMR data of psoriasis patients, particularly in identifying affected body areas.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:疫苗是至关重要的公共卫生工具,尽管疫苗的犹豫继续对疫苗的全面摄取构成重大威胁,因此,社区健康。了解和跟踪疫苗犹豫对于有效的公共卫生干预措施至关重要;然而,传统的调查方法存在各种局限性。
    目的:本研究旨在创建一个实时,基于自然语言处理(NLP)的工具,用于评估3个著名社交媒体平台上的疫苗情绪和犹豫。
    方法:我们从Twitter(随后更名为X)挖掘并策划了英语讨论,Reddit,和YouTube社交媒体平台在2011年1月1日至2021年10月31日之间发布,涉及人乳头瘤病毒;麻疹,腮腺炎,风疹和未指明的疫苗。我们测试了多种NLP算法,将疫苗情绪分类为阳性,中性,或阴性,并使用世界卫生组织(WHO)3Cs对疫苗犹豫进行分类(置信度,自满,和便利性)犹豫模型,将在线仪表板概念化,以说明和说明趋势。
    结果:我们收集了超过8600万次讨论。我们表现最好的NLP模型显示,情感分类的准确度从0.51到0.78,犹豫分类的准确度从0.69到0.91。我们平台上的探索性分析强调了在线活动中关于疫苗情绪和犹豫的变化,为不同的疫苗提供独特的模式。
    结论:我们的创新系统对主要社交网络中的3个疫苗主题进行情绪和犹豫的实时分析,提供关键的趋势见解,以协助旨在提高疫苗使用率和公共卫生的运动。
    BACKGROUND: Vaccines serve as a crucial public health tool, although vaccine hesitancy continues to pose a significant threat to full vaccine uptake and, consequently, community health. Understanding and tracking vaccine hesitancy is essential for effective public health interventions; however, traditional survey methods present various limitations.
    OBJECTIVE: This study aimed to create a real-time, natural language processing (NLP)-based tool to assess vaccine sentiment and hesitancy across 3 prominent social media platforms.
    METHODS: We mined and curated discussions in English from Twitter (subsequently rebranded as X), Reddit, and YouTube social media platforms posted between January 1, 2011, and October 31, 2021, concerning human papillomavirus; measles, mumps, and rubella; and unspecified vaccines. We tested multiple NLP algorithms to classify vaccine sentiment into positive, neutral, or negative and to classify vaccine hesitancy using the World Health Organization\'s (WHO) 3Cs (confidence, complacency, and convenience) hesitancy model, conceptualizing an online dashboard to illustrate and contextualize trends.
    RESULTS: We compiled over 86 million discussions. Our top-performing NLP models displayed accuracies ranging from 0.51 to 0.78 for sentiment classification and from 0.69 to 0.91 for hesitancy classification. Explorative analysis on our platform highlighted variations in online activity about vaccine sentiment and hesitancy, suggesting unique patterns for different vaccines.
    CONCLUSIONS: Our innovative system performs real-time analysis of sentiment and hesitancy on 3 vaccine topics across major social networks, providing crucial trend insights to assist campaigns aimed at enhancing vaccine uptake and public health.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    维持ER到高尔基囊泡形成和运输的机制是复杂的。作为适配器之一,Ninein-likeprotein(Nlp)participatedinassemblyandtransportingofpartialER-to-高尔基囊泡,如β-连环蛋白和STING。Nlp充当平台,通过直接与SEC31A和Rab1B结合,在COPII和COPI包被的囊泡过渡和运输过程中维持货物的特异性和连续性。因此,我们提出了一个整合运输模型,特定的衔接子通过与不同的膜相关蛋白合作参与特定的货物选择或运输,以确保货物运输的连续性。Nlp的缺乏导致囊泡出芽失败和内质网中未加工蛋白的积累,这进一步导致ER应激以及高尔基破碎,UPR的PERK-eIF2α通路被激活以减少通用蛋白的合成。相比之下,Nlp的上调导致高尔基体破碎,这提高了ER和高尔基之间的货物运输效率。此外,Nlp缺陷小鼠易发生自发性B细胞淋巴瘤,由于淋巴细胞的发育和功能通过ER到高尔基囊泡的运输显著依赖于分泌蛋白,包括IL-13、IL-17和IL-21。因此,Nlp的扰动改变了ER到高尔基体的通讯和细胞内稳态,可能与B细胞淋巴瘤的发病有关。
    The mechanism that maintains ER-to-Golgi vesicles formation and transport is complicated. As one of the adapters, Ninein-like protein (Nlp) participated in assembly and transporting of partial ER-to-Golgi vesicles that contained specific proteins, such as β-Catenin and STING. Nlp acted as a platform to sustain the specificity and continuity of cargoes during COPII and COPI-coated vesicle transition and transportation through binding directly with SEC31A as well as Rab1B. Thus, we proposed an integrated transport model that particular adapter participated in specific cargo selection or transportation through cooperating with different membrane associated proteins to ensure the continuity of cargo trafficking. Deficiency of Nlp led to vesicle budding failure and accumulation of unprocessed proteins in ER, which further caused ER stress as well as Golgi fragmentation, and PERK-eIF2α pathway of UPR was activated to reduce the synthesis of universal proteins. In contrast, upregulation of Nlp resulted in Golgi fragmentation, which enhanced the cargo transport efficiency between ER and Golgi. Moreover, Nlp deficient mice were prone to spontaneous B cell lymphoma, since the developments and functions of lymphocytes significantly depended on secretory proteins through ER-to-Golgi vesicle trafficking, including IL-13, IL-17 and IL-21. Thus, perturbations of Nlp altered ER-to-Golgi communication and cellular homeostasis, and might contribute to the pathogenesis of B cell lymphoma.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    循证决策的前提是收集和分析有关主题的全部信息。系统评价允许根据PICO原则(人口,干预,control,结果)。然而,进行系统审查通常是一个缓慢的过程,会大量消耗资源。根本问题是,目前建立系统审查的方法无法扩展以应对大量非结构化证据所带来的挑战。出于这个原因,加拿大公共卫生署一直在研究证据综合不同阶段的自动化,以提高效率。在这篇文章中,我们概述了一种新颖的基于机器学习的系统的初始版本,该系统由自然语言处理(NLP)的最新进展提供支持。比如Biobert,使用新的免疫特定文档数据库完成了进一步的优化。该系统核心的优化NLP模型能够从免疫出版物中识别和提取PICO相关领域,在五类文本中的平均准确率为88%。通过简单的Web界面提供功能。
    Evidence-informed decision making is based on the premise that the entirety of information on a topic is collected and analyzed. Systematic reviews allow for data from different studies to be rigorously assessed according to PICO principles (population, intervention, control, outcomes). However, conducting a systematic review is generally a slow process that is a significant drain on resources. The fundamental problem is that the current approach to creating a systematic review cannot scale to meet the challenges resulting from the massive body of unstructured evidence. For this reason, the Public Health Agency of Canada has been examining the automation of different stages of evidence synthesis to increase efficiencies. In this article, we present an overview of an initial version of a novel machine learning-based system that is powered by recent advances in natural language processing (NLP), such as BioBERT, with further optimizations completed using a new immunization-specific document database. The resulting optimized NLP model at the core of this system is able to identify and extract PICO-related fields from publications on immunization with an average accuracy of 88% across five classes of text. Functionality is provided through a straightforward web interface.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:在依靠行政卫生数据时,对医院获得性压力性伤害(HAPI)的监视通常是次优的,众所周知,国际疾病分类(ICD)代码具有很长的延迟,并且编码不足。我们在自由文本笔记上利用自然语言处理(NLP)应用程序,特别是住院护理笔记,来自电子病历(EMR),更准确、更及时地识别HAPI。
    目的:这项研究旨在表明,基于EMR的表型算法比单独的ICD-10-CA算法更适合检测HAPI,而临床日志使用护理笔记通过NLP以更高的准确性记录。
    方法:在2015年至2018年在卡尔加里进行的一项临床试验中,从当地三级急性护理医院的从头到脚皮肤评估中确定了患有HAPI的患者。艾伯塔省,加拿大。与出院摘要数据库链接后,从EMR数据库中提取试验期间记录的临床记录。在模型开发过程中,通过顺序正向选择处理了几种临床注释的不同组合。使用随机森林(RF)开发了用于HAPI检测的文本分类算法,极端梯度提升(XGBoost),和深度学习模型。调整分类阈值以使该模型能够实现与基于ICD的表型研究相似的特异性。评估了每个模型的性能,并在指标之间进行了比较,包括灵敏度,正预测值,负预测值,和F1得分。
    结果:本研究使用了来自280名符合条件的患者的数据,其中97例患者在试验期间出现HAPI.RF是最佳执行模型,灵敏度为0.464(95%CI0.365-0.563),特异性0.984(95%CI0.965-1.000),F1评分为0.612(95%CI为0.473-0.751)。与先前报道的基于ICD的算法的性能相比,机器学习(ML)模型在不牺牲太多特异性的情况下达到了更高的灵敏度。
    结论:基于EMR的NLP表型算法在HAPI病例检测中的性能优于单独的ICD-10-CA代码。EMR中每日生成的护理笔记是ML模型准确检测不良事件的宝贵数据资源。该研究有助于提高自动化医疗质量和安全监控。
    BACKGROUND: Surveillance of hospital-acquired pressure injuries (HAPI) is often suboptimal when relying on administrative health data, as International Classification of Diseases (ICD) codes are known to have long delays and are undercoded. We leveraged natural language processing (NLP) applications on free-text notes, particularly the inpatient nursing notes, from electronic medical records (EMRs), to more accurately and timely identify HAPIs.
    OBJECTIVE: This study aimed to show that EMR-based phenotyping algorithms are more fitted to detect HAPIs than ICD-10-CA algorithms alone, while the clinical logs are recorded with higher accuracy via NLP using nursing notes.
    METHODS: Patients with HAPIs were identified from head-to-toe skin assessments in a local tertiary acute care hospital during a clinical trial that took place from 2015 to 2018 in Calgary, Alberta, Canada. Clinical notes documented during the trial were extracted from the EMR database after the linkage with the discharge abstract database. Different combinations of several types of clinical notes were processed by sequential forward selection during the model development. Text classification algorithms for HAPI detection were developed using random forest (RF), extreme gradient boosting (XGBoost), and deep learning models. The classification threshold was tuned to enable the model to achieve similar specificity to an ICD-based phenotyping study. Each model\'s performance was assessed, and comparisons were made between the metrics, including sensitivity, positive predictive value, negative predictive value, and F1-score.
    RESULTS: Data from 280 eligible patients were used in this study, among whom 97 patients had HAPIs during the trial. RF was the optimal performing model with a sensitivity of 0.464 (95% CI 0.365-0.563), specificity of 0.984 (95% CI 0.965-1.000), and F1-score of 0.612 (95% CI of 0.473-0.751). The machine learning (ML) model reached higher sensitivity without sacrificing much specificity compared to the previously reported performance of ICD-based algorithms.
    CONCLUSIONS: The EMR-based NLP phenotyping algorithms demonstrated improved performance in HAPI case detection over ICD-10-CA codes alone. Daily generated nursing notes in EMRs are a valuable data resource for ML models to accurately detect adverse events. The study contributes to enhancing automated health care quality and safety surveillance.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:自然语言处理(NLP)已成为医疗保健领域的新兴技术,它利用电子健康记录中的大量自由文本数据来改善患者护理,支持临床决策,促进临床和转化科学研究。最近,深度学习在许多临床NLP任务中取得了最先进的表现。然而,训练深度学习模型通常需要大量的,带注释的数据集,这通常不是公开可用的,并且在临床领域构建可能很耗时。在临床NLP中,处理较小的注释数据集是典型的;因此,确保深度学习模型表现良好对于现实世界的临床NLP应用至关重要。一种广泛采用的方法是微调现有的预训练语言模型,但是当训练数据集仅包含几个带注释的样本时,这些尝试就失败了。最近研究了少量学习(FSL)来解决这个问题。暹罗神经网络(SNN)已被广泛用作计算机视觉中的FSL方法,但在NLP中尚未得到很好的研究。此外,关于其在临床领域应用的文献很少。
    目的:我们研究的目的是提出和评估基于SNN的方法,用于少量的临床NLP任务。
    方法:我们提出了2种基于SNN的FSL方法,包括预训练的SNN和具有二阶嵌入的SNN。我们评估了临床句子分类任务中提出的方法。我们实验了3个少拍设置,包括四射,8杆,和16杆学习。临床NLP任务使用以下4个预训练语言模型进行基准测试:来自变压器的双向编码器表示(BERT),BERT用于生物医学文本挖掘(BioBERT),Biobert接受临床笔记培训(BioClinicalBERT),和发电预训练变压器2(GPT-2)。我们还介绍了基于SNN的方法和基于提示的GPT-2方法之间的性能比较。
    结果:在四射句子分类任务中,GPT-2的精度最高(0.63),但其召回率(0.38)和F评分(0.42)低于基于BioBERT的预训练SNN(分别为0.45和0.46)。在8杆和16杆设置中,基于SNN的方法在所有3个精度指标上都优于GPT-2,召回,F得分。
    结论:实验结果验证了所提出的SNN方法对少量临床NLP任务的有效性。
    BACKGROUND: Natural language processing (NLP) has become an emerging technology in health care that leverages a large amount of free-text data in electronic health records to improve patient care, support clinical decisions, and facilitate clinical and translational science research. Recently, deep learning has achieved state-of-the-art performance in many clinical NLP tasks. However, training deep learning models often requires large, annotated data sets, which are normally not publicly available and can be time-consuming to build in clinical domains. Working with smaller annotated data sets is typical in clinical NLP; therefore, ensuring that deep learning models perform well is crucial for real-world clinical NLP applications. A widely adopted approach is fine-tuning existing pretrained language models, but these attempts fall short when the training data set contains only a few annotated samples. Few-shot learning (FSL) has recently been investigated to tackle this problem. Siamese neural network (SNN) has been widely used as an FSL approach in computer vision but has not been studied well in NLP. Furthermore, the literature on its applications in clinical domains is scarce.
    OBJECTIVE: The aim of our study is to propose and evaluate SNN-based approaches for few-shot clinical NLP tasks.
    METHODS: We propose 2 SNN-based FSL approaches, including pretrained SNN and SNN with second-order embeddings. We evaluate the proposed approaches on the clinical sentence classification task. We experiment with 3 few-shot settings, including 4-shot, 8-shot, and 16-shot learning. The clinical NLP task is benchmarked using the following 4 pretrained language models: bidirectional encoder representations from transformers (BERT), BERT for biomedical text mining (BioBERT), BioBERT trained on clinical notes (BioClinicalBERT), and generative pretrained transformer 2 (GPT-2). We also present a performance comparison between SNN-based approaches and the prompt-based GPT-2 approach.
    RESULTS: In 4-shot sentence classification tasks, GPT-2 had the highest precision (0.63), but its recall (0.38) and F score (0.42) were lower than those of BioBERT-based pretrained SNN (0.45 and 0.46, respectively). In both 8-shot and 16-shot settings, SNN-based approaches outperformed GPT-2 in all 3 metrics of precision, recall, and F score.
    CONCLUSIONS: The experimental results verified the effectiveness of the proposed SNN approaches for few-shot clinical NLP tasks.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号