Large language models

大型语言模型
  • 文章类型: Journal Article
    及时的工程,安排输入或提示给大型语言模型的过程,以指导其产生所需的输出,是一个新兴的研究领域,塑造了这些模型如何理解任务,处理信息,并在广泛的自然语言处理(NLP)应用程序中生成响应。数字心理健康,另一方面,由于包括早期发现和干预在内的几个原因变得越来越重要,并减轻用于临床诊断的高技能医务人员的有限可用性。这篇简短的评论概述了数字心理健康NLP领域的即时工程的最新进展。据我们所知,这篇评论是第一次尝试讨论最新的提示工程类型,方法,以及在数字心理健康应用中使用的任务。我们讨论了三种类型的数字心理健康任务:分类,代,和问题回答。最后,我们讨论挑战,局限性,伦理考虑,以及数字心理健康快速工程的未来方向。我们认为,这篇简短的评论为数字心理健康的即时工程的未来研究提供了有用的出发点。
    Prompt engineering, the process of arranging input or prompts given to a large language model to guide it in producing desired outputs, is an emerging field of research that shapes how these models understand tasks, process information, and generate responses in a wide range of natural language processing (NLP) applications. Digital mental health, on the other hand, is becoming increasingly important for several reasons including early detection and intervention, and to mitigate limited availability of highly skilled medical staff for clinical diagnosis. This short review outlines the latest advances in prompt engineering in the field of NLP for digital mental health. To our knowledge, this review is the first attempt to discuss the latest prompt engineering types, methods, and tasks that are used in digital mental health applications. We discuss three types of digital mental health tasks: classification, generation, and question answering. To conclude, we discuss the challenges, limitations, ethical considerations, and future directions in prompt engineering for digital mental health. We believe that this short review contributes a useful point of departure for future research in prompt engineering for digital mental health.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    随着人工智能(AI)日益渗透到社会的各个方面,包括医疗保健,变压器神经网络架构的采用正在迅速改变许多应用。Transformer是一种深度学习架构,最初是为解决通用自然语言处理(NLP)任务而开发的,后来在许多领域得到了应用。包括医疗保健。在这份调查报告中,我们概述了如何采用这种架构来分析各种形式的医疗保健数据,包括临床NLP,医学成像,结构化电子健康记录(EHR),社交媒体,生物生理信号,生物分子序列。此外,其中还包括在重症监护的保护伞下使用变压器架构生成手术指导和预测手术后不良结果的文章。在不同的环境下,这些模型已用于临床诊断,报告生成,数据重建,和药物/蛋白质合成。最后,我们还讨论了在医疗保健中使用变压器的好处和局限性,并研究了计算成本等问题,模型可解释性,公平,与人类价值观保持一致,伦理含义,和环境影响。
    With Artificial Intelligence (AI) increasingly permeating various aspects of society, including healthcare, the adoption of the Transformers neural network architecture is rapidly changing many applications. Transformer is a type of deep learning architecture initially developed to solve general-purpose Natural Language Processing (NLP) tasks and has subsequently been adapted in many fields, including healthcare. In this survey paper, we provide an overview of how this architecture has been adopted to analyze various forms of healthcare data, including clinical NLP, medical imaging, structured Electronic Health Records (EHR), social media, bio-physiological signals, biomolecular sequences. Furthermore, which have also include the articles that used the transformer architecture for generating surgical instructions and predicting adverse outcomes after surgeries under the umbrella of critical care. Under diverse settings, these models have been used for clinical diagnosis, report generation, data reconstruction, and drug/protein synthesis. Finally, we also discuss the benefits and limitations of using transformers in healthcare and examine issues such as computational cost, model interpretability, fairness, alignment with human values, ethical implications, and environmental impact.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    目的:该研究开发了框架,该框架利用开源的大型语言模型(LLM),使临床医生能够对患者的整个超声心动图报告历史提出简单的问题。这种方法旨在简化从多个超声心动图报告中提取临床见解的过程。特别是在患有复杂心脏病的患者中,从而提高患者护理和研究效率。
    方法:收集了超过10年的数据,包括在西奈山卫生系统存档的超过10个超声心动图的患者的超声心动图报告。这些报告被转换成每个患者的单一文件进行分析,分解为片段,并使用文本相似性度量检索相关片段。LLaMA-270B模型用于使用特制提示分析文本。该模型的性能是根据心脏病学家创建的地面实况答案进行评估的。
    结果:该研究分析了37例患者的432份报告,共100份问答对。LLM正确回答了90%的问题,时间性的准确率为83%,93%用于严重程度评估,84%用于干预识别,100%用于诊断检索。错误主要源于LLM的固有限制,比如误解数字或幻觉。
    结论:该研究证明了使用本地,用于查询和解释超声心动图报告数据的开源LLM。这种方法比传统的基于关键字的搜索有了显著的改进,实现更多上下文相关和语义上准确的反应;反过来,通过促进更有效地访问复杂的患者数据,在加强临床决策和研究方面显示出希望。
    OBJECTIVE: The study developed framework that leverages an open-source Large Language Model (LLM) to enable clinicians to ask plain-language questions about a patient\'s entire echocardiogram report history. This approach is intended to streamline the extraction of clinical insights from multiple echocardiogram reports, particularly in patients with complex cardiac diseases, thereby enhancing both patient care and research efficiency.
    METHODS: Data from over 10 years were collected, comprising echocardiogram reports from patients with more than 10 echocardiograms on file at the Mount Sinai Health System. These reports were converted into a single document per patient for analysis, broken down into snippets and relevant snippets were retrieved using text similarity measures. The LLaMA-2 70B model was employed for analyzing the text using a specially crafted prompt. The model\'s performance was evaluated against ground-truth answers created by faculty cardiologists.
    RESULTS: The study analyzed 432 reports from 37 patients for a total of 100 question-answer pairs. The LLM correctly answered 90% questions, with accuracies of 83% for temporality, 93% for severity assessment, 84% for intervention identification, and 100% for diagnosis retrieval. Errors mainly stemmed from the LLM\'s inherent limitations, such as misinterpreting numbers or hallucinations.
    CONCLUSIONS: The study demonstrates the feasibility and effectiveness of using a local, open-source LLM for querying and interpreting echocardiogram report data. This approach offers a significant improvement over traditional keyword-based searches, enabling more contextually relevant and semantically accurate responses; in turn showing promise in enhancing clinical decision-making and research by facilitating more efficient access to complex patient data.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    最近AI在心血管护理方面的进步提供了诊断方面的潜在增强,治疗,和结果。迄今为止的创新专注于自动化测量,提高图像质量,并使用新方法检测疾病。应用跨越可穿戴设备,心电图,超声心动图,血管造影,遗传学,还有更多.AI模型以技术或人类专家以前无法实现的准确性从心电图中检测疾病。包括降低射血分数,心脏瓣膜病,和其他心肌病。然而,人工智能的独特特征需要通过解决训练方法进行严格的验证,真实世界的功效,公平问题,和长期的可靠性。尽管心血管人工智能的研究呈指数级增长,仍然缺乏显示结局改善的试验。目前正在进行一些工作。在设定高评估基准的同时,采用这种快速发展的技术对于心脏病学来说,利用人工智能来增强患者护理和提供者体验至关重要。
    Recent artificial intelligence (AI) advancements in cardiovascular care offer potential enhancements in diagnosis, treatment, and outcomes. Innovations to date focus on automating measurements, enhancing image quality, and detecting diseases using novel methods. Applications span wearables, electrocardiograms, echocardiography, angiography, genetics, and more. AI models detect diseases from electrocardiograms at accuracy not previously achieved by technology or human experts, including reduced ejection fraction, valvular heart disease, and other cardiomyopathies. However, AI\'s unique characteristics necessitate rigorous validation by addressing training methods, real-world efficacy, equity concerns, and long-term reliability. Despite an exponentially growing number of studies in cardiovascular AI, trials showing improvement in outcomes remain lacking. A number are currently underway. Embracing this rapidly evolving technology while setting a high evaluation benchmark will be crucial for cardiology to leverage AI to enhance patient care and the provider experience.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    最近的人工智能(AI)在心血管护理方面的进步提供了有效诊断的潜在增强,治疗,和结果。现在有600多种食品和药物管理局(FDA)批准的临床AI算法,10%的人专注于心血管应用,强调AI增加护理的机会越来越多。这篇综述讨论了人工智能领域的最新进展,特别关注多模态输入的利用和生成AI领域。本评论中的进一步讨论涉及一种方法来理解AI增强护理可能存在的更大背景,并讨论了严格评估的必要性,用于部署的适当基础设施,道德和公平评估,监管监督,和可行的业务部署案例。在采用这种快速发展的技术的同时,通过谨慎和以患者为中心的实施设置适当的高评估基准,对于心脏病学利用AI来增强患者护理和提供者体验至关重要。
    Recent artificial intelligence (AI) advancements in cardiovascular care offer potential enhancements in effective diagnosis, treatment, and outcomes. More than 600 U.S. Food and Drug Administration-approved clinical AI algorithms now exist, with 10% focusing on cardiovascular applications, highlighting the growing opportunities for AI to augment care. This review discusses the latest advancements in the field of AI, with a particular focus on the utilization of multimodal inputs and the field of generative AI. Further discussions in this review involve an approach to understanding the larger context in which AI-augmented care may exist, and include a discussion of the need for rigorous evaluation, appropriate infrastructure for deployment, ethics and equity assessments, regulatory oversight, and viable business cases for deployment. Embracing this rapidly evolving technology while setting an appropriately high evaluation benchmark with careful and patient-centered implementation will be crucial for cardiology to leverage AI to enhance patient care and the provider experience.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:近年来,大型语言模型在商业和消费者环境中的应用呈指数级增长。然而,在大型语言模型如何支持护理实践方面,文献中存在差距,教育,和研究。本研究旨在综合有关护理专业中大型语言模型的当前和潜在用途的现有文献。
    方法:快速回顾文献,在Cochrane快速审查方法和PRISMA报告标准的指导下,进行了。一位专家的健康馆员协助制定了广泛的纳入标准,以说明与大型语言模型有关的文献的新兴性质。三个电子数据库(即,PubMed,CINAHL,和Embase)在2023年8月进行了搜索,以确定相关文献。讨论发展的文章,使用,并纳入护理中大型语言模型的应用进行分析。
    结果:文献检索确定了总共2028篇符合纳入标准的文章。在系统地审阅摘要后,titles,和全文,最终分析包括30篇文章。几乎所有(93%;n=28)的文章都使用ChatGPT作为例子,随后讨论了大型语言模式在护理教育中的使用和价值(47%;n=14),临床实践(40%;n=12),和研究(10%;n=3)。虽然大型语言模型的最常见评估是通过人类评估进行的(26.7%;n=8),这项分析还确定了护理中大型语言模型的常见局限性,包括缺乏系统的评估,以及其他道德和法律考虑。
    结论:这是第一篇综述,旨在总结当代有关大型语言模型在护理实践中的当前和潜在用途的文献。教育,和研究。尽管存在应用大型语言模型的重要机会,在护理中使用和采用这些模式引发了一系列挑战,比如与偏见相关的伦理问题,误用,和抄袭。
    结论:鉴于大型语言模型的相对新颖性,正在努力制定和实施有意义的评估,评估,标准,并建议在护理中应用大型语言模型的指南,以确保适当的,准确,和安全使用。需要未来的研究以及临床和教育合作伙伴关系,以增强对护理和医疗保健中大型语言模型的理解和应用。
    BACKGROUND: The application of large language models across commercial and consumer contexts has grown exponentially in recent years. However, a gap exists in the literature on how large language models can support nursing practice, education, and research. This study aimed to synthesize the existing literature on current and potential uses of large language models across the nursing profession.
    METHODS: A rapid review of the literature, guided by Cochrane rapid review methodology and PRISMA reporting standards, was conducted. An expert health librarian assisted in developing broad inclusion criteria to account for the emerging nature of literature related to large language models. Three electronic databases (i.e., PubMed, CINAHL, and Embase) were searched to identify relevant literature in August 2023. Articles that discussed the development, use, and application of large language models within nursing were included for analysis.
    RESULTS: The literature search identified a total of 2028 articles that met the inclusion criteria. After systematically reviewing abstracts, titles, and full texts, 30 articles were included in the final analysis. Nearly all (93 %; n = 28) of the included articles used ChatGPT as an example, and subsequently discussed the use and value of large language models in nursing education (47 %; n = 14), clinical practice (40 %; n = 12), and research (10 %; n = 3). While the most common assessment of large language models was conducted by human evaluation (26.7 %; n = 8), this analysis also identified common limitations of large language models in nursing, including lack of systematic evaluation, as well as other ethical and legal considerations.
    CONCLUSIONS: This is the first review to summarize contemporary literature on current and potential uses of large language models in nursing practice, education, and research. Although there are significant opportunities to apply large language models, the use and adoption of these models within nursing have elicited a series of challenges, such as ethical issues related to bias, misuse, and plagiarism.
    CONCLUSIONS: Given the relative novelty of large language models, ongoing efforts to develop and implement meaningful assessments, evaluations, standards, and guidelines for applying large language models in nursing are recommended to ensure appropriate, accurate, and safe use. Future research along with clinical and educational partnerships is needed to enhance understanding and application of large language models in nursing and healthcare.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Systematic Review
    背景:为医学检查目的编写多项选择题(MCQ)具有挑战性。它需要广泛的医学知识,医学教育工作者的时间和精力。本系统综述着重于大型语言模型(LLM)在生成医学MCQ中的应用。
    方法:作者搜索了截至2023年11月发表的研究。搜索术语集中在LLM上,生成了用于体检的MCQ。非英语,超出年度范围,不关注人工智能生成的多项选择题的研究被排除在外。MEDLINE用作搜索数据库。使用定制的QUADAS-2工具评估偏倚风险。
    结果:总体而言,纳入了2023年4月至2023年10月间发表的8项研究.六项研究使用了Chat-GPT3.5,而两项研究使用了GPT4。五项研究表明,LLM可以提出适合医学考试的合格问题。三项研究使用LLM撰写医学问题,但没有评估问题的有效性。一项研究对不同的模型进行了比较分析。另一项研究将LLM生成的问题与人类编写的问题进行了比较。所有研究都提出了错误的问题,被认为不适合进行医学检查。有些问题需要额外的修改才能合格。
    结论:LLM可用于编写医学检查的MCQ。然而,其局限性不容忽视。该领域的进一步研究至关重要,需要更多确凿的证据。在那之前,LLM可以作为撰写医学检查的补充工具。2项研究存在高偏倚风险。该研究遵循系统评价和荟萃分析(PRISMA)指南的首选报告项目。
    BACKGROUND: Writing multiple choice questions (MCQs) for the purpose of medical exams is challenging. It requires extensive medical knowledge, time and effort from medical educators. This systematic review focuses on the application of large language models (LLMs) in generating medical MCQs.
    METHODS: The authors searched for studies published up to November 2023. Search terms focused on LLMs generated MCQs for medical examinations. Non-English, out of year range and studies not focusing on AI generated multiple-choice questions were excluded. MEDLINE was used as a search database. Risk of bias was evaluated using a tailored QUADAS-2 tool.
    RESULTS: Overall, eight studies published between April 2023 and October 2023 were included. Six studies used Chat-GPT 3.5, while two employed GPT 4. Five studies showed that LLMs can produce competent questions valid for medical exams. Three studies used LLMs to write medical questions but did not evaluate the validity of the questions. One study conducted a comparative analysis of different models. One other study compared LLM-generated questions with those written by humans. All studies presented faulty questions that were deemed inappropriate for medical exams. Some questions required additional modifications in order to qualify.
    CONCLUSIONS: LLMs can be used to write MCQs for medical examinations. However, their limitations cannot be ignored. Further study in this field is essential and more conclusive evidence is needed. Until then, LLMs may serve as a supplementary tool for writing medical examinations. 2 studies were at high risk of bias. The study followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Systematic Review
    目标:尽管乳腺癌管理技术先进,在有效解释大量临床数据以获得患者特异性见解方面仍然存在挑战.我们回顾了诸如ChatGPT之类的大型语言模型(LLM)如何在该领域提供解决方案的文献。
    方法:我们搜索了MEDLINE在2023年12月22日之前发表的相关研究。关键词包括:“大型语言模型”,\"LLM\",\"GPT\",\"ChatGPT\",\"OpenAI\",和“乳房”。使用QUADAS-2工具评估风险偏倚。
    结果:六项评估ChatGPT-3.5或GPT-4的研究符合我们的纳入标准。他们探索了临床笔记分析,基于准则的问答,和患者管理建议。研究之间的准确性不同,从50%到98%不等。在诸如信息检索之类的结构化任务中可以看到更高的准确性。一半的研究使用了真实的病人数据,增加临床实用价值。挑战包括准确性不一致,对问题提出方式的依赖性(提示依赖性),在某些情况下,缺少关键的临床信息。
    结论:LLM在乳腺癌治疗中具有潜力,特别是在文本信息提取和指南驱动的临床问答中。然而,它们不一致的准确性强调了对这些模型进行仔细验证的必要性,以及持续监督的重要性。
    OBJECTIVE: Despite advanced technologies in breast cancer management, challenges remain in efficiently interpreting vast clinical data for patient-specific insights. We reviewed the literature on how large language models (LLMs) such as ChatGPT might offer solutions in this field.
    METHODS: We searched MEDLINE for relevant studies published before December 22, 2023. Keywords included: \"large language models\", \"LLM\", \"GPT\", \"ChatGPT\", \"OpenAI\", and \"breast\". The risk bias was evaluated using the QUADAS-2 tool.
    RESULTS: Six studies evaluating either ChatGPT-3.5 or GPT-4, met our inclusion criteria. They explored clinical notes analysis, guideline-based question-answering, and patient management recommendations. Accuracy varied between studies, ranging from 50 to 98%. Higher accuracy was seen in structured tasks like information retrieval. Half of the studies used real patient data, adding practical clinical value. Challenges included inconsistent accuracy, dependency on the way questions are posed (prompt-dependency), and in some cases, missing critical clinical information.
    CONCLUSIONS: LLMs hold potential in breast cancer care, especially in textual information extraction and guideline-driven clinical question-answering. Yet, their inconsistent accuracy underscores the need for careful validation of these models, and the importance of ongoing supervision.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    共享决策(SDM)在神经肿瘤学中至关重要,促进患者和医疗保健专业人员之间的合作,以导航治疗方案。然而,神经肿瘤疾病的复杂性以及患者的认知和情感负担对实现有效的SDM构成了重大障碍。本讨论探讨了大型语言模型(LLM)的潜力,例如OpenAI的ChatGPT和Google的Bard,以克服这些障碍。提供一种方法,以提高患者的理解和参与他们的护理。LLM,通过提供可访问性,个性化信息,可以支持但不能取代医疗保健专业人员的重要见解。该假设表明,患者,通过LLM更好地了解信息,可以更积极地参与他们的治疗选择。将LLM整合到神经肿瘤学中需要进行伦理考虑,包括保护患者数据和确保知情同意,同时明智地使用AI技术。未来的努力应该集中在建立道德准则上,适应医疗保健工作流程,促进以患者为导向的研究,anddevelopingtrainingprogramsforclinicaldocumentsontheuseofLLM.ContinuousevaluationofLLMapplicationswillbevitaltomaintaintheireffectivenessandalignmentwithpatientneeds.最终,这项探索认为,将LLM周到地整合到SDM流程中,可以显著提高患者参与程度,并加强神经肿瘤护理中的医患关系.
    Shared decision-making (SDM) is crucial in neuro-oncology, fostering collaborations between patients and healthcare professionals to navigate treatment options. However, the complexity of neuro-oncological conditions and the cognitive and emotional burdens on patients present significant barriers to achieving effective SDM. This discussion explores the potential of large language models (LLMs) such as OpenAI\'s ChatGPT and Google\'s Bard to overcome these barriers, offering a means to enhance patient understanding and engagement in their care. LLMs, by providing accessible, personalized information, could support but not supplant the critical insights of healthcare professionals. The hypothesis suggests that patients, better informed through LLMs, may participate more actively in their treatment choices. Integrating LLMs into neuro-oncology requires navigating ethical considerations, including safeguarding patient data and ensuring informed consent, alongside the judicious use of AI technologies. Future efforts should focus on establishing ethical guidelines, adapting healthcare workflows, promoting patient-oriented research, and developing training programs for clinicians on the use of LLMs. Continuous evaluation of LLM applications will be vital to maintain their effectiveness and alignment with patient needs. Ultimately, this exploration contends that the thoughtful integration of LLMs into SDM processes could significantly enhance patient involvement and strengthen the patient-physician relationship in neuro-oncology care.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Review
    目标:像OpenAI的ChatGPT这样的大型语言模型(LLM)是强大的生成系统,可以快速合成自然语言响应。对LLM的研究揭示了它们的潜力和陷阱,尤其是在临床环境中。然而,医学LLM研究的不断发展的景观在他们的评估方面留下了几个空白,应用程序,和证据基础。
    目的:本范围综述旨在(1)总结当前有关LLM在医学应用中的准确性和有效性的研究证据,(2)商量伦理,legal,后勤,以及LLM在临床环境中使用的社会经济意义,(3)探索医疗保健中LLM实施的障碍和促进者,(4)提出一个评估LLM临床效用的标准化评估框架,(5)确定证据空白,并提出未来LLM在临床应用中的研究方向。
    方法:我们从MEDLINE筛选了4,036条记录,EMBASE,CINAHL,medRxiv,bioRxiv,和arXiv从2023年1月(搜索开始)到2023年6月26日的英文论文,并分析了55项全球研究的结果。根据牛津循证医学中心的建议报告证据质量。
    结果:我们的结果表明,LLM在编制患者笔记方面显示出希望,协助患者在医疗保健系统中导航,在某种程度上,当与人类监督相结合时,支持临床决策。然而,它们的利用受到可能伤害患者的训练数据偏见的限制,产生不准确但令人信服的信息,和道德,legal,社会经济,和隐私问题。我们还发现缺乏评估LLM有效性和可行性的标准化方法。
    结论:因此,这篇综述强调了解决这些局限性的潜在未来方向和问题,并进一步探索LLM在增强医疗保健服务方面的潜力。
    结论:问题大型语言模型(LLM)在临床环境中的应用现状如何?以及与它们的整合相关的主要挑战和机遇是什么?分析55项研究,表示当LLM,包括OpenAI的ChatGPT,在编制病人笔记方面显示出潜力,协助医疗保健导航,并支持临床决策,它们的使用受到数据偏见的限制,产生看似合理但不正确的信息,以及各种道德和隐私问题。研究的严谨性有很大差异,尤其是在评估LLM响应时,呼吁标准化的评估方法,包括既定的指标,如ROUGE,METEOR,G-Eval,和MultiMedQA。意义研究结果表明,在LLM研究中需要增强的方法,强调整合真实患者数据和考虑健康的社会决定因素的重要性,提高LLM在临床环境中的适用性和安全性。
    OBJECTIVE: Large language models (LLMs) like OpenAI\'s ChatGPT are powerful generative systems that rapidly synthesize natural language responses. Research on LLMs has revealed their potential and pitfalls, especially in clinical settings. However, the evolving landscape of LLM research in medicine has left several gaps regarding their evaluation, application, and evidence base.
    OBJECTIVE: This scoping review aims to (1) summarize current research evidence on the accuracy and efficacy of LLMs in medical applications, (2) discuss the ethical, legal, logistical, and socioeconomic implications of LLM use in clinical settings, (3) explore barriers and facilitators to LLM implementation in healthcare, (4) propose a standardized evaluation framework for assessing LLMs\' clinical utility, and (5) identify evidence gaps and propose future research directions for LLMs in clinical applications.
    METHODS: We screened 4,036 records from MEDLINE, EMBASE, CINAHL, medRxiv, bioRxiv, and arXiv from January 2023 (inception of the search) to June 26, 2023 for English-language papers and analyzed findings from 55 worldwide studies. Quality of evidence was reported based on the Oxford Centre for Evidence-based Medicine recommendations.
    RESULTS: Our results demonstrate that LLMs show promise in compiling patient notes, assisting patients in navigating the healthcare system, and to some extent, supporting clinical decision-making when combined with human oversight. However, their utilization is limited by biases in training data that may harm patients, the generation of inaccurate but convincing information, and ethical, legal, socioeconomic, and privacy concerns. We also identified a lack of standardized methods for evaluating LLMs\' effectiveness and feasibility.
    CONCLUSIONS: This review thus highlights potential future directions and questions to address these limitations and to further explore LLMs\' potential in enhancing healthcare delivery.
    CONCLUSIONS: Question What is the current state of Large Language Models’ (LLMs) application in clinical settings, and what are the primary challenges and opportunities associated with their integration? Findings This scoping review, analyzing 55 studies, indicates that while LLMs, including OpenAI’s ChatGPT, show potential in compiling patient notes, aiding in healthcare navigation, and supporting clinical decision-making, their use is constrained by data biases, the generation of plausible but incorrect information, and various ethical and privacy concerns. A significant variability in the rigor of studies, especially in evaluating LLM responses, calls for standardized evaluation methods, including established metrics like ROUGE, METEOR, G-Eval, and MultiMedQA. Meaning The findings suggest a need for enhanced methodologies in LLM research, stressing the importance of integrating real patient data and considering social determinants of health, to improve the applicability and safety of LLMs in clinical environments.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号