in-context learning

  • 文章类型: Journal Article
    电子健康记录文本来源,如药物标记(sigs)包含有价值的信息,这些信息并不总是以结构化形式提供。通常通过手动注释处理,这个重复和耗时的任务可以使用大型语言模型(LLM)完全自动化。虽然大多数sigs包括简单的指令,有些包括复杂的模式。
    我们旨在将GPT-3.5和GPT-4的性能与较小的微调型号进行比较(ClinicalBERT,Bluebert)在提取2种具有频繁复杂sigs的免疫调节药物的平均日剂量:羟氯喹,和泼尼松。
    使用手动注释的sigs作为黄金标准,我们比较了这些模型在702份羟氯喹和22.104份泼尼松处方中的表现.
    GPT-4在任何级别的上下文学习中都大大优于此任务的所有其他模型。有100个上下文示例,该模型正确地注释了94%的羟氯喹和95%的泼尼松sigs在1个有效数字以内。由另外2个手动注释器对注释器模型分歧进行的错误分析表明,绝大多数分歧都是模型错误。许多模型错误与模棱两可的sig有关,在这些sig上也经常存在注释者的分歧。
    与最少的手动注释配对,GPT-4在复杂药物治疗的语言回归方面取得了出色的表现,并且大大优于GPT-3.5,ClinicalBERT,还有Bluebert.然而,达到最大性能所需的上下文示例数量与GPT-3.5相似.
    LLM显示出巨大的潜力,可以以无代码的方式从sigs中快速提取结构化数据,用于临床和研究应用。
    UNASSIGNED: Electronic health record textual sources such as medication signeturs (sigs) contain valuable information that is not always available in structured form. Commonly processed through manual annotation, this repetitive and time-consuming task could be fully automated using large language models (LLMs). While most sigs include simple instructions, some include complex patterns.
    UNASSIGNED: We aimed to compare the performance of GPT-3.5 and GPT-4 with smaller fine-tuned models (ClinicalBERT, BlueBERT) in extracting the average daily dose of 2 immunomodulating medications with frequent complex sigs: hydroxychloroquine, and prednisone.
    UNASSIGNED: Using manually annotated sigs as the gold standard, we compared the performance of these models in 702 hydroxychloroquine and 22 104 prednisone prescriptions.
    UNASSIGNED: GPT-4 vastly outperformed all other models for this task at any level of in-context learning. With 100 in-context examples, the model correctly annotates 94% of hydroxychloroquine and 95% of prednisone sigs to within 1 significant digit. Error analysis conducted by 2 additional manual annotators on annotator-model disagreements suggests that the vast majority of disagreements are model errors. Many model errors relate to ambiguous sigs on which there was also frequent annotator disagreement.
    UNASSIGNED: Paired with minimal manual annotation, GPT-4 achieved excellent performance for language regression of complex medication sigs and vastly outperforms GPT-3.5, ClinicalBERT, and BlueBERT. However, the number of in-context examples needed to reach maximum performance was similar to GPT-3.5.
    UNASSIGNED: LLMs show great potential to rapidly extract structured data from sigs in no-code fashion for clinical and research applications.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    近年来,随着各种机器学习方法的兴起,紫外和近红外(UV-NIR)光谱分析在复杂系统的测定中令人印象深刻。然而,基于传统机器学习的UV-NIR光谱分析需要独立训练,并针对不同的样本或任务进行繁琐的参数调整。因此,训练高质量的模型通常是复杂和耗时的。大型语言模型(LLM)是深度学习的前沿成果之一,参数大小为十亿。LLM可以从输入中提取抽象信息并有效地使用它。即使没有任何额外的培训,只使用简单的自然语言提示,LLM可以完成在全新领域中从未见过的任务。我们期待在频谱分析中利用这种功能,以减少耗时和操作困难。在这项研究中,我们使用UV-NIR光谱分析来预测三种不同水样中化学需氧量(COD)的浓度,包括复杂的废水。通过提取频谱中的特征带,我们将它们输入LLM进行浓度预测。我们比较了不同模型对水样的COD预测结果,并讨论了不同实验设置对LLM的影响。结果表明,即使有简短的提示,废水中LLM的预测达到最佳性能,R2和RMSE等于0.931和10.966,超过了传统模型的最佳结果,其中R2和RMSE对应于0.920和11.854。这一结果表明,LLM,操作更简单,耗时更少,在UV-NIR光谱分析中具有接近甚至超越传统机器学习模型的能力。总之,我们的研究提出了一种基于LLM的UV-NIR光谱分析的新方法,并初步证明了LLM的应用潜力。
    In recent years, with the rise of various machine learning methods, the Ultraviolet and Near Infrared (UV-NIR) spectral analysis has been impressive in the determination of intricate systems. However, the UV-NIR spectral analysis based on traditional machine learning requires independent training with tedious parameter tuning for different samples or tasks. As a result, training a high-quality model is often complicated and time-consuming. Large language model (LLM) is one of the cutting-edge achievements in deep learning, with the parameter size of the order of billion. LLM can extract abstract information from input and use it effectively. Even without any additional training, using only simple natural language prompts, LLM can accomplish tasks that have never been seen before in completely new domains. We look forward to utilizing this capability in spectral analysis to reduce the time-consuming and operational difficulties. In this study, we used UV-NIR spectral analysis to predict the concentration of Chemical Oxygen Demand (COD) in three different water samples, including a complex wastewater. By extracting the characteristic bands in the spectrum, we input them into LLM for concentration prediction. We compared the COD prediction results of different models on water samples and discussed the effects of different experiments setting on LLM. The results show that even with brief prompts, the prediction of LLM in wastewater achieved the best performance, with R2 and RMSE equal to 0.931 and 10.966, which exceed the best results of traditional models, where R2 and RMSE correspond to 0.920 and 11.854. This result indicates that LLM, with simpler operation and less time-consuming, has ability to approach or even surpass traditional machine learning models in UV-NIR spectral analysis. In conclusion, our study proposed a new method for the UV-NIR spectral analysis based on LLM and preliminary demonstrated the potential of LLM for application.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Preprint
    弱视是一种神经发育性视觉障碍,影响全球约3-5%的儿童,如果不及早诊断和治疗,可能导致视力丧失。传统的诊断方法,依赖主观评估和专家解释的眼动记录在资源有限的眼保健中心提出了挑战。这项研究引入了一种新方法,该方法将双子座大语言模型(LLM)与眼动跟踪数据集成在一起,以开发用于诊断弱视患者的分类工具。该研究表明:(1)LLM可以成功地应用于注视眼球运动数据的分析,以诊断弱视患者;(2)输入医学主题专业知识,在这项研究中以医学专家增强一代(MEAG)的形式引入,是使用LLM的医学应用的通用检索增强生成(RAG)方法的有效适应。本研究为眼科应用引入了一种新的多视图提示框架,该框架将儿科眼科医生的细粒度反馈与上下文学习相结合,以报告诊断弱视患者的准确率为80%。除了二元分类任务,根据弱视的严重程度,分类工具可推广到弱视患者的特定亚群,弱视类型,有或没有眼球震颤。该模型报告准确率为:(1)对中度或重度弱视患者进行分类的准确率为83%,(2)81%对轻度或治疗过的弱视患者进行分类;(3)对眼球震颤患者进行分类的准确率为85%。据我们所知,这是第一项使用MEAG定义多视图提示框架的研究,用于分析用于诊断弱视患者的眼动追踪数据.
    Amblyopia is a neurodevelopmental visual disorder that affects approximately 3-5% of children globally and it can lead to vision loss if it is not diagnosed and treated early. Traditional diagnostic methods, which rely on subjective assessments and expert interpretation of eye movement recordings presents challenges in resource-limited eye care centers. This study introduces a new approach that integrates the Gemini large language model (LLM) with eye-tracking data to develop a classification tool for diagnosis of patients with amblyopia. The study demonstrates: (1) LLMs can be successfully applied to the analysis of fixation eye movement data to diagnose patients with amblyopia; and (2) Input of medical subject matter expertise, introduced in this study in the form of medical expert augmented generation (MEAG), is an effective adaption of the generic retrieval augmented generation (RAG) approach for medical applications using LLMs. This study introduces a new multi-view prompting framework for ophthalmology applications that incorporates fine granularity feedback from pediatric ophthalmologist together with in-context learning to report an accuracy of 80% in diagnosing patients with amblyopia. In addition to the binary classification task, the classification tool is generalizable to specific subpopulations of amblyopic patients based on severity of amblyopia, type of amblyopia, and with or without nystagmus. The model reports an accuracy of: (1) 83% in classifying patients with moderate or severe amblyopia, (2) 81% in classifying patients with mild or treated amblyopia; and (3) 85% accuracy in classifying patients with nystagmus. To the best of our knowledge, this is the first study that defines a multi-view prompting framework with MEAG to analyze eye tracking data for the diagnosis of amblyopic patients.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    近年来,临床试验报告的出版激增,这使得进行系统审查具有挑战性。自动提取人口,干预,比较器,和临床试验研究的结果(PICO)可以缓解传统上耗时的手动审查系统评价过程.PICO帧提取的现有方法涉及监督方法,该方法依赖于BIO标签标记形式的手动注释数据点的存在。最近的方法,如上下文学习(ICL),已被证明对许多下游NLP任务有效,需要使用带标签的示例。在这项工作中,我们采用ICL策略,利用大型语言模型(LLM)的预训练知识,在LLM的预培训阶段收集,从无监督设置的临床试验文档中自动提取与PICO相关的术语,以绕过大量注释数据实例的可用性。此外,为了在有大量注释样本可用的oracle场景中展示LLM的最高有效性,我们采用指令调整策略,通过使用低秩适应(LORA)在低资源环境中对PICO帧提取任务进行巨大模型的训练。更具体地说,这两个拟议的框架都利用AlpaCare作为基础LLM,它采用了少量上下文学习和指令调整技术,从临床试验报告中提取与PICO相关的术语.我们将这些方法应用于广泛使用的粗粒度数据集,如EBM-NLP,EBM-COMET和细粒度数据集,如EBM-NLTPrev和EBM-NLPh。我们的实证结果表明,我们提出的基于ICL的框架在所有版本的EBM-NLP数据集上产生了可比的结果,而我们提出的框架的指令调整版本在所有不同的EBM-NLP数据集上产生了最新的结果。我们的项目可在https://github.com/sprimonmuke0202/AlpaPICO上获得。git.
    In recent years, there has been a surge in the publication of clinical trial reports, making it challenging to conduct systematic reviews. Automatically extracting Population, Intervention, Comparator, and Outcome (PICO) from clinical trial studies can alleviate the traditionally time-consuming process of manually scrutinizing systematic reviews. Existing approaches of PICO frame extraction involves supervised approach that relies on the existence of manually annotated data points in the form of BIO label tagging. Recent approaches, such as In-Context Learning (ICL), which has been shown to be effective for a number of downstream NLP tasks, require the use of labeled examples. In this work, we adopt ICL strategy by employing the pretrained knowledge of Large Language Models (LLMs), gathered during the pretraining phase of an LLM, to automatically extract the PICO-related terminologies from clinical trial documents in unsupervised set up to bypass the availability of large number of annotated data instances. Additionally, to showcase the highest effectiveness of LLM in oracle scenario where large number of annotated samples are available, we adopt the instruction tuning strategy by employing Low Rank Adaptation (LORA) to conduct the training of gigantic model in low resource environment for the PICO frame extraction task. More specifically, both of the proposed frameworks utilize AlpaCare as base LLM which employs both few-shot in-context learning and instruction tuning techniques to extract PICO-related terms from the clinical trial reports. We applied these approaches to the widely used coarse-grained datasets such as EBM-NLP, EBM-COMET and fine-grained datasets such as EBM-NLPrev and EBM-NLPh. Our empirical results show that our proposed ICL-based framework produces comparable results on all the version of EBM-NLP datasets and the proposed instruction tuned version of our framework produces state-of-the-art results on all the different EBM-NLP datasets. Our project is available at https://github.com/shrimonmuke0202/AlpaPICO.git.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:大型语言模型(LLM)在自然语言处理(NLP)中显示出非凡的能力,特别是在标记数据稀缺或昂贵的领域,例如临床领域。然而,为了解开隐藏在这些LLM中的临床知识,我们需要设计有效的提示,引导他们在没有任何任务特定训练数据的情况下执行特定的临床NLP任务.这被称为上下文学习,这是一门艺术和科学,需要了解不同LLM的优势和劣势,并迅速采用工程方法。
    目的:本研究的目的是评估各种即时工程技术的有效性,包括2个新引入的类型-启发式和合奏提示,使用预训练的语言模型进行零射和少射临床信息提取。
    方法:这项全面的实验研究评估了不同的提示类型(简单的前缀,简单的完形填空,思想链,预期,启发式,和合奏)跨越5个临床NLP任务:临床意义消歧,生物医学证据提取,共同参照决议,药物状态提取,和药物属性提取。使用3种最先进的语言模型评估了这些提示的性能:GPT-3.5(OpenAI),双子座(谷歌),和LLaMA-2(Meta)。该研究将零射与少射提示进行了对比,并探讨了合奏方法的有效性。
    结果:研究表明,针对特定任务的提示定制对于LLM在零射临床NLP中的高性能至关重要。在临床意义上的消歧,GPT-3.5在启发式提示下达到0.96的准确性,在生物医学证据提取中达到0.94的准确性。启发式提示,伴随着一连串的思想提示,跨任务非常有效。在复杂的场景中,很少有机会提示提高性能,和集合方法利用了多种即时优势。GPT-3.5在任务和提示类型上的表现始终优于Gemini和LLaMA-2。
    结论:本研究对即时工程方法进行了严格的评估,并介绍了临床信息提取的创新技术,证明了临床领域上下文学习的潜力。这些发现为未来基于提示的临床NLP研究提供了明确的指导方针。促进非NLP专家参与临床NLP进步。据我们所知,这是在这个生成人工智能时代,对临床NLP的不同提示工程方法进行实证评估的首批作品之一,我们希望它能激励和指导未来在这一领域的研究。
    BACKGROUND: Large language models (LLMs) have shown remarkable capabilities in natural language processing (NLP), especially in domains where labeled data are scarce or expensive, such as the clinical domain. However, to unlock the clinical knowledge hidden in these LLMs, we need to design effective prompts that can guide them to perform specific clinical NLP tasks without any task-specific training data. This is known as in-context learning, which is an art and science that requires understanding the strengths and weaknesses of different LLMs and prompt engineering approaches.
    OBJECTIVE: The objective of this study is to assess the effectiveness of various prompt engineering techniques, including 2 newly introduced types-heuristic and ensemble prompts, for zero-shot and few-shot clinical information extraction using pretrained language models.
    METHODS: This comprehensive experimental study evaluated different prompt types (simple prefix, simple cloze, chain of thought, anticipatory, heuristic, and ensemble) across 5 clinical NLP tasks: clinical sense disambiguation, biomedical evidence extraction, coreference resolution, medication status extraction, and medication attribute extraction. The performance of these prompts was assessed using 3 state-of-the-art language models: GPT-3.5 (OpenAI), Gemini (Google), and LLaMA-2 (Meta). The study contrasted zero-shot with few-shot prompting and explored the effectiveness of ensemble approaches.
    RESULTS: The study revealed that task-specific prompt tailoring is vital for the high performance of LLMs for zero-shot clinical NLP. In clinical sense disambiguation, GPT-3.5 achieved an accuracy of 0.96 with heuristic prompts and 0.94 in biomedical evidence extraction. Heuristic prompts, alongside chain of thought prompts, were highly effective across tasks. Few-shot prompting improved performance in complex scenarios, and ensemble approaches capitalized on multiple prompt strengths. GPT-3.5 consistently outperformed Gemini and LLaMA-2 across tasks and prompt types.
    CONCLUSIONS: This study provides a rigorous evaluation of prompt engineering methodologies and introduces innovative techniques for clinical information extraction, demonstrating the potential of in-context learning in the clinical domain. These findings offer clear guidelines for future prompt-based clinical NLP research, facilitating engagement by non-NLP experts in clinical NLP advancements. To the best of our knowledge, this is one of the first works on the empirical evaluation of different prompt engineering approaches for clinical NLP in this era of generative artificial intelligence, and we hope that it will inspire and inform future research in this area.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • DOI:
    文章类型: Preprint
    人类学习对类似规则的结构和用于培训的示例课程很敏感。在由简洁规则管理的任务中,当相关示例在试验中被阻止时,学习更健壮,但是在没有这样的规则的情况下,交错更有效。迄今为止,没有一个神经模型能同时捕捉到这些看似矛盾的效应。在这里,我们表明,在经过元学习训练的神经网络和大型语言模型(LLM)中,“上下文学习”(ICL)都会自发出现相同的权衡。ICL是通过在激活动力学中实现的内部循环算法来学习“上下文中”新任务的能力-无需权重更改。预先训练的LLM和金属学习变压器的实验表明,ICL在涉及规则结构的任务中表现出人类表现出的阻塞优势,反过来,同时进行的权重学习再现了人类在缺乏这种结构的任务中观察到的交织优势。
    Human learning is sensitive to rule-like structure and the curriculum of examples used for training. In tasks governed by succinct rules, learning is more robust when related examples are blocked across trials, but in the absence of such rules, interleaving is more effective. To date, no neural model has simultaneously captured these seemingly contradictory effects. Here we show that this same tradeoff spontaneously emerges with \"in-context learning\" (ICL) both in neural networks trained with metalearning and in large language models (LLMs). ICL is the ability to learn new tasks \"in context\" - without weight changes - via an inner-loop algorithm implemented in activation dynamics. Experiments with pretrained LLMs and metalearning transformers show that ICL exhibits the blocking advantage demonstrated in humans on a task involving rule-like structure, and conversely, that concurrent in-weight learning reproduces the interleaving advantage observed in humans on tasks lacking such structure.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    本文介绍了“上下文中运算符学习”的范式和相应的“上下文中运算符网络”模型,以同时从提示数据中学习运算符并将其应用于推理阶段的新问题,没有任何重量更新。现有的方法仅限于使用神经网络来逼近特定的方程解或特定的算子,当切换到具有不同方程式的新问题时,需要重新训练。通过将单个神经网络训练为操作员学习器,而不是解/算子逼近器,我们不仅可以摆脱重新训练(甚至微调)神经网络的新问题,而且还可以利用操作员之间共享的共性,以便在学习新操作员时只需要提示中的几个示例。我们的数值结果表明,单个神经网络可以作为多种类型的微分方程问题的少量算子学习器,包括常微分方程的正反问题,偏微分方程,和平均场控制问题,并表明它可以将其学习能力推广到培训分配之外的操作员。
    This paper introduces the paradigm of \"in-context operator learning\" and the corresponding model \"In-Context Operator Networks\" to simultaneously learn operators from the prompted data and apply it to new questions during the inference stage, without any weight update. Existing methods are limited to using a neural network to approximate a specific equation solution or a specific operator, requiring retraining when switching to a new problem with different equations. By training a single neural network as an operator learner, rather than a solution/operator approximator, we can not only get rid of retraining (even fine-tuning) the neural network for new problems but also leverage the commonalities shared across operators so that only a few examples in the prompt are needed when learning a new operator. Our numerical results show the capability of a single neural network as a few-shot operator learner for a diversified type of differential equation problems, including forward and inverse problems of ordinary differential equations, partial differential equations, and mean-field control problems, and also show that it can generalize its learning capability to operators beyond the training distribution.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号