背景:及时的工程,专注于为大型语言模型(LLM)制作有效的提示,hasgarneredattentionforitscapabilitiesatharnessingthepotentialofLLM.Thisisevenmorecriticalinthemedicaldomainduetoitsspeciallytermsandlanguagetechnicity.临床自然语言处理应用程序必须导航复杂的语言并确保隐私合规性。提示工程通过设计量身定制的提示来指导模型从复杂的医学文本中利用临床相关信息,从而提供了一种新颖的方法。尽管承诺,即时工程在医学领域的功效仍有待充分探索。
目的:本研究的目的是回顾医学应用快速工程的研究工作和技术方法,并概述临床实践的机遇和挑战。
方法:数据库索引医学领域,计算机科学,和医学信息学进行了查询,以确定相关的已发表论文。由于提示工程是一个新兴领域,还考虑了印前数据库。提取了多个数据,例如提示范式,涉及的LLM,研究的语言,主题的领域,基线,和一些学习,设计,和特定于提示工程的建筑策略。我们包括将基于工程的快速方法应用于医学领域的研究,2022年至2024年出版,涵盖了多种提示范式,如提示学习(PL),提示调谐(PT),和提示设计(PD)。
结果:我们纳入了114项最新的即时工程研究。在3个提示范例中,我们观察到PD是最普遍的(78篇论文)。在12篇论文中,PD,PL,和PT术语可互换使用。虽然ChatGPT是最常用的LLM,我们在敏感的临床数据集上使用该LLM确定了7项研究.思想链,在17项研究中,成为最常见的PD技术。虽然PL和PT论文通常为评估基于提示的方法提供基线,61%(48/78)的PD研究没有报告任何非提示相关的基线。最后,我们单独检查每一个关键的提示工程特定的信息报告跨论文,发现许多研究忽略了明确提及它们,对推进及时工程研究构成了挑战。
结论:除了报告趋势和即时工程的科学景观外,我们为未来的研究提供报告指南,以帮助推进医学领域的研究.我们还公开了总结可用的医学提示工程论文的表格和数字,并希望未来的贡献将利用这些现有的工作来更好地推进该领域。
BACKGROUND: Prompt engineering, focusing on crafting effective prompts to large language models (LLMs), has garnered attention for its capabilities at harnessing the potential of LLMs. This is even more crucial in the medical domain due to its specialized terminology and language technicity. Clinical natural language processing applications must navigate complex language and ensure privacy compliance. Prompt engineering offers a novel approach by designing tailored prompts to guide models in exploiting clinically relevant information from complex medical texts. Despite its promise, the efficacy of prompt engineering in the medical domain remains to be fully explored.
OBJECTIVE: The aim of the study is to review research efforts and technical approaches in prompt engineering for medical applications as well as provide an overview of opportunities and challenges for clinical practice.
METHODS: Databases indexing the fields of medicine, computer science, and medical informatics were queried in order to identify relevant published papers. Since prompt engineering is an emerging field, preprint databases were also considered. Multiple data were extracted, such as the prompt paradigm, the involved LLMs, the languages of the study, the domain of the topic, the baselines, and several learning, design, and architecture strategies specific to prompt engineering. We include studies that apply prompt engineering-based methods to the medical domain, published between 2022 and 2024, and covering multiple prompt paradigms such as prompt learning (PL), prompt tuning (PT), and prompt design (PD).
RESULTS: We included 114 recent prompt engineering studies. Among the 3 prompt paradigms, we have observed that PD is the most prevalent (78 papers). In 12 papers, PD, PL, and PT terms were used interchangeably. While ChatGPT is the most commonly used LLM, we have identified 7 studies using this LLM on a sensitive clinical data set. Chain-of-thought, present in 17 studies, emerges as the most frequent PD technique. While PL and PT papers typically provide a baseline for evaluating prompt-based approaches, 61% (48/78) of the PD studies do not report any nonprompt-related baseline. Finally, we individually examine each of the key prompt engineering-specific information reported across papers and find that many studies neglect to explicitly mention them, posing a challenge for advancing prompt engineering research.
CONCLUSIONS: In addition to reporting on trends and the scientific landscape of prompt engineering, we provide reporting guidelines for future studies to help advance research in the medical field. We also disclose tables and figures summarizing medical prompt engineering papers available and hope that future contributions will leverage these existing works to better advance the field.