Large language model

大型语言模型
  • 文章类型: Editorial
    暂无摘要。
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:ChatGPT是迄今为止最先进的大型语言模型,先前的迭代已经通过了医疗执照考试,提供临床决策支持,和改进的诊断。虽然有限,过去对ChatGPT表现的研究发现,人工智能可以通过美国心脏协会的高级心血管生命支持(ACLS)检查。ChatGPT的准确性尚未在更复杂的临床场景中进行研究。由于心脏病和心脏骤停仍然是美国发病率和死亡率的主要原因,寻找有助于提高对ACLS算法依从性的技术,这改善了生存结果,是至关重要的。
    目的:本研究旨在检查ChatGPT在遵循ACLS指南中对心动过缓和心脏骤停的准确性。
    方法:我们根据2020年美国心脏协会ACLS指南评估了ChatGPT对2种模拟的反应的准确性,其中有3种主要结果:平均单步准确性,每次模拟尝试的准确度得分,和每个算法的准确度得分。对于每个模拟步骤,ChatGPT被评分为正确性(1分)或不正确性(0分)。每次模拟进行20次。
    结果:ChatGPT对心脏骤停的每一步的中位准确率为85%(IQR40%-100%),对心动过缓的中位准确率为30%(IQR13%-81%)。在心脏骤停的20次模拟尝试中,ChatGPT的中位准确性为69%(IQR67%-74%),心动过缓的中位准确性为42%(IQR33%-50%)。我们发现,尽管输入一致,ChatGPT的输出却有所不同,同样的行动一直被错过,重复的过分强调阻碍了指导,并提出了错误的用药信息。
    结论:本研究强调需要一致和可靠的指导,以防止潜在的医疗错误,并优化ChatGPT的应用,以提高其在临床实践中的可靠性和有效性。
    BACKGROUND: ChatGPT is the most advanced large language model to date, with prior iterations having passed medical licensing examinations, providing clinical decision support, and improved diagnostics. Although limited, past studies of ChatGPT\'s performance found that artificial intelligence could pass the American Heart Association\'s advanced cardiovascular life support (ACLS) examinations with modifications. ChatGPT\'s accuracy has not been studied in more complex clinical scenarios. As heart disease and cardiac arrest remain leading causes of morbidity and mortality in the United States, finding technologies that help increase adherence to ACLS algorithms, which improves survival outcomes, is critical.
    OBJECTIVE: This study aims to examine the accuracy of ChatGPT in following ACLS guidelines for bradycardia and cardiac arrest.
    METHODS: We evaluated the accuracy of ChatGPT\'s responses to 2 simulations based on the 2020 American Heart Association ACLS guidelines with 3 primary outcomes of interest: the mean individual step accuracy, the accuracy score per simulation attempt, and the accuracy score for each algorithm. For each simulation step, ChatGPT was scored for correctness (1 point) or incorrectness (0 points). Each simulation was conducted 20 times.
    RESULTS: ChatGPT\'s median accuracy for each step was 85% (IQR 40%-100%) for cardiac arrest and 30% (IQR 13%-81%) for bradycardia. ChatGPT\'s median accuracy over 20 simulation attempts for cardiac arrest was 69% (IQR 67%-74%) and for bradycardia was 42% (IQR 33%-50%). We found that ChatGPT\'s outputs varied despite consistent input, the same actions were persistently missed, repetitive overemphasis hindered guidance, and erroneous medication information was presented.
    CONCLUSIONS: This study highlights the need for consistent and reliable guidance to prevent potential medical errors and optimize the application of ChatGPT to enhance its reliability and effectiveness in clinical practice.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:日本循环学会2022年非心脏手术围手术期心血管评估和管理指南规范了术前心血管评估。本研究调查了大型语言模型(LLM)在提供符合JCS2022指南的准确响应方面的功效。方法和结果:咨询请求数据,医生心血管记录,并对患者的反应内容进行了分析。虚拟场景是使用真实世界的临床数据创建的,然后针对此类场景咨询了LLM。结论:GoogleBARD可以根据JCS2022指南在低风险病例中准确提供回应。GoogleGemini在中高风险案例中的准确性得到了显著提高。
    Background: The Japanese Circulation Society 2022 Guideline on Perioperative Cardiovascular Assessment and Management for Non-Cardiac Surgery standardizes preoperative cardiovascular assessments. The present study investigated the efficacy of a large language model (LLM) in providing accurate responses meeting the JCS 2022 Guideline. Methods and Results: Data on consultation requests, physicians\' cardiovascular records, and patients\' response content were analyzed. Virtual scenarios were created using real-world clinical data, and a LLM was then consulted for such scenarios. Conclusions: Google BARD could accurately provide responses in accordance with the JCS 2022 Guideline in low-risk cases. Google Gemini has significantly improved its accuracy in intermediate- and high-risk cases.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    这项研究探索了基于医学指南的生成预训练变压器(GPT)代理在创伤性脑损伤(TBI)康复相关问题中的应用。为了评估使用GPT-4产生的多种药剂(GPT-药剂)的有效性,使用直接GPT-4作为对照组(GPT-4)进行比较。GPT代理包括具有不同功能的多个代理,包括“医学指南分类”,\"问题检索\",\"匹配评估\",“智能问答(QA)”,和“结果评估和来源引用”。从医患问答数据库中选择脑康复问题进行评估。主要终点是一个更好的答案。次要终点是准确性,完整性,可解释性,和同理心。回答了30个问题;与GPT-4相比,整个GPT代理商花费的时间和单词要多得多(时间:54.05vs.9.66s,单词:371vs.57).然而,与GPT-4相比,GPT-agents在更多情况下提供了更好的答案(66.7vs.33.3%)。GPT-Agents在准确性评估中超过GPT-4(3.8±1.02与3.2±0.96,p=0.0234)。未发现不完整答案的差异(2±0.87与1.7±0.79,p=0.213)。然而,在可解释性方面(2.79±0.45vs.07±0.52,p<0.001)和同理心(2.63±0.57vs.1.08±0.51,p<0.001)评估,GPT试剂的表现明显更好。根据医学指南,GPT代理提高了对TBI康复问题的回答的准确性和同理心。这项研究提供了指南参考,并证明了改善的临床可解释性。然而,有必要在临床环境中通过多中心试验进行进一步验证.这项研究提供了实用的见解,并为LLM药物的潜在理论整合奠定了基础。
    This study explored the application of generative pre-trained transformer (GPT) agents based on medical guidelines using large language model (LLM) technology for traumatic brain injury (TBI) rehabilitation-related questions. To assess the effectiveness of multiple agents (GPT-agents) created using GPT-4, a comparison was conducted using direct GPT-4 as the control group (GPT-4). The GPT-agents comprised multiple agents with distinct functions, including \"Medical Guideline Classification\", \"Question Retrieval\", \"Matching Evaluation\", \"Intelligent Question Answering (QA)\", and \"Results Evaluation and Source Citation\". Brain rehabilitation questions were selected from the doctor-patient Q&A database for assessment. The primary endpoint was a better answer. The secondary endpoints were accuracy, completeness, explainability, and empathy. Thirty questions were answered; overall GPT-agents took substantially longer and more words to respond than GPT-4 (time: 54.05 vs. 9.66 s, words: 371 vs. 57). However, GPT-agents provided superior answers in more cases compared to GPT-4 (66.7 vs. 33.3%). GPT-Agents surpassed GPT-4 in accuracy evaluation (3.8 ± 1.02 vs. 3.2 ± 0.96, p = 0.0234). No difference in incomplete answers was found (2 ± 0.87 vs. 1.7 ± 0.79, p = 0.213). However, in terms of explainability (2.79 ± 0.45 vs. 07 ± 0.52, p < 0.001) and empathy (2.63 ± 0.57 vs. 1.08 ± 0.51, p < 0.001) evaluation, the GPT-agents performed notably better. Based on medical guidelines, GPT-agents enhanced the accuracy and empathy of responses to TBI rehabilitation questions. This study provides guideline references and demonstrates improved clinical explainability. However, further validation through multicenter trials in a clinical setting is necessary. This study offers practical insights and establishes groundwork for the potential theoretical integration of LLM-agents medicine.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    结论:我们创建了一个LangChain/OpenAIAPI驱动的聊天机器人,仅基于过敏和鼻窦炎(ICAR-RS)的国际共识声明。ICAR-RS聊天机器人能够提供直接和可行的建议。使用共识声明为AI在医疗保健中的应用提供了机会。
    CONCLUSIONS: We created a LangChain/OpenAI API-powered chatbot based solely on International Consensus Statement of Allergy and Rhinology: Rhinosinusitis (ICAR-RS). The ICAR-RS chatbot is able to provide direct and actionable recommendations. Utilization of consensus statements provides an opportunity for AI applications in healthcare.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:创新的大型语言模型(LLM)驱动的聊天机器人,现在非常流行,代表公众复苏信息的潜在来源。例如,聊天机器人生成的建议可用于社区复苏教育或在现实生活中对未经训练的非专业救援人员提供及时的信息支持。
    目的:本研究的重点是评估两个著名的基于LLM的聊天机器人的性能,特别是在聊天机器人生成的关于如何为无呼吸的受害者提供帮助的建议的质量方面。
    方法:2023年5月,新的Bing(微软公司,美国)和巴德(谷歌有限责任公司,USA)聊天机器人被询问(每个n=20):\“如果有人没有呼吸怎么办?\”使用预先开发的清单评估了聊天机器人的内容,以符合2021年复苏委员会英国指南。
    结果:两个聊天机器人都为查询提供了上下文相关的文本响应。然而,在回答中,与指南一致的对无呼吸患者的帮助说明的覆盖率很低:Bing和Bard完全满足检查表标准的回答的平均百分比分别为9.5%和11.4%(P>.05).旁观者行动的基本要素,包括早期开始和不间断地进行足够深度的胸部按压,rate,和胸部后坐力,以及对自动体外除颤器(AED)的要求和使用,通常是失踪的。此外,巴德55.0%的回答包含似是而非的声音,但是荒谬的指导,称为人工幻觉,这会给受害者带来护理不足和伤害的风险。
    结论:LLM驱动的聊天机器人关于帮助无呼吸受害者的建议忽略了复苏技术的基本细节,偶尔包含欺骗性,潜在有害的指令。需要进一步的研究和监管措施来减轻与聊天机器人产生的公众关于复苏的错误信息相关的风险。
    BACKGROUND: Innovative large language model (LLM)-powered chatbots, which are extremely popular nowadays, represent potential sources of information on resuscitation for the general public. For instance, the chatbot-generated advice could be used for purposes of community resuscitation education or for just-in-time informational support of untrained lay rescuers in a real-life emergency.
    OBJECTIVE: This study focused on assessing performance of two prominent LLM-based chatbots, particularly in terms of quality of the chatbot-generated advice on how to give help to a non-breathing victim.
    METHODS: In May 2023, the new Bing (Microsoft Corporation, USA) and Bard (Google LLC, USA) chatbots were inquired (n = 20 each): \"What to do if someone is not breathing?\" Content of the chatbots\' responses was evaluated for compliance with the 2021 Resuscitation Council United Kingdom guidelines using a pre-developed checklist.
    RESULTS: Both chatbots provided context-dependent textual responses to the query. However, coverage of the guideline-consistent instructions on help to a non-breathing victim within the responses was poor: mean percentage of the responses completely satisfying the checklist criteria was 9.5% for Bing and 11.4% for Bard (P >.05). Essential elements of the bystander action, including early start and uninterrupted performance of chest compressions with adequate depth, rate, and chest recoil, as well as request for and use of an automated external defibrillator (AED), were missing as a rule. Moreover, 55.0% of Bard\'s responses contained plausible sounding, but nonsensical guidance, called artificial hallucinations, that create risk for inadequate care and harm to a victim.
    CONCLUSIONS: The LLM-powered chatbots\' advice on help to a non-breathing victim omits essential details of resuscitation technique and occasionally contains deceptive, potentially harmful directives. Further research and regulatory measures are required to mitigate risks related to the chatbot-generated misinformation of public on resuscitation.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    暂无摘要。
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    简介本案例研究旨在通过采用逐步的系统方法来提高医学文本中ChatGPT-4的可追溯性和检索准确性。重点是从三个关于糖尿病酮症酸中毒(DKA)的国际指南中检索临床答案。方法建立了系统的方法来指导检索过程。每个指南都提出了一个问题,以确保准确性并保持引用。ChatGPT-4被用来检索答案,并集成了“链接阅读器”插件,以方便直接访问包含指南的网页。随后,ChatGPT-4用于编译答案,同时提供对来源的引用。每个问题重复这个过程30次,以确保一致性。在这份报告中,我们介绍了我们对检索准确性的观察,反应的一致性,以及在此过程中遇到的挑战。结果将ChatGPT-4与“链接阅读器”插件集成在一起显示了显着的可追溯性和检索准确性优势。根据分析的指南,AI模型成功提供了相关且准确的临床答案。尽管偶尔会遇到网页访问和轻微的内存漂移的挑战,集成系统的整体性能是有希望的。答案的汇编也令人印象深刻,并为进一步的审判带来了巨大的希望。结论本案例研究的结果有助于利用AI文本生成模型作为医学专业人员和研究人员的有价值的工具。本案例研究中采用的系统方法和“链接阅读器”插件的集成为自动化医学文本合成提供了一个框架,在从不同来源编译之前一次问一个问题,这提高了人工智能模型的可追溯性和检索准确性。AI模型的进一步改进和完善以及与其他软件实用程序的集成有望增强AI生成的建议在医学和科学学术界的实用性和适用性。这些进步有可能推动日常医疗实践的重大改进。
    Introduction This case study aimed to enhance the traceability and retrieval accuracy of ChatGPT-4 in medical text by employing a step-by-step systematic approach. The focus was on retrieving clinical answers from three international guidelines on diabetic ketoacidosis (DKA). Methods A systematic methodology was developed to guide the retrieval process. One question was asked per guideline to ensure accuracy and maintain referencing. ChatGPT-4 was utilized to retrieve answers, and the \'Link Reader\' plug-in was integrated to facilitate direct access to webpages containing the guidelines. Subsequently, ChatGPT-4 was employed to compile answers while providing citations to the sources. This process was iterated 30 times per question to ensure consistency. In this report, we present our observations regarding the retrieval accuracy, consistency of responses, and the challenges encountered during the process. Results Integrating ChatGPT-4 with the \'Link Reader\' plug-in demonstrated notable traceability and retrieval accuracy benefits. The AI model successfully provided relevant and accurate clinical answers based on the analyzed guidelines. Despite occasional challenges with webpage access and minor memory drift, the overall performance of the integrated system was promising. The compilation of the answers was also impressive and held significant promise for further trials. Conclusion The findings of this case study contribute to the utilization of AI text-generation models as valuable tools for medical professionals and researchers. The systematic approach employed in this case study and the integration of the \'Link Reader\' plug-in offer a framework for automating medical text synthesis, asking one question at a time before compilation from different sources, which has led to improving AI models\' traceability and retrieval accuracy. Further advancements and refinement of AI models and integration with other software utilities hold promise for enhancing the utility and applicability of AI-generated recommendations in medicine and scientific academia. These advancements have the potential to drive significant improvements in everyday medical practice.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:目前对于在学术医学中使用诸如ChatGPT之类的大型语言模型的标准尚无明确的共识。因此,我们对现有文献进行了范围审查,以了解LLM在医学中的应用现状,并为今后在学术界的应用提供指南.
    方法:在2023年2月16日通过Medline搜索使用包括人工智能在内的关键字组合对文献进行了范围审查,机器学习,自然语言处理,蓄能式预训练变压器,ChatGPT,和大型语言模型。对语言或发布日期没有限制。不属于LLM的记录被排除。与LLMChatBots和ChatGPT相关的记录分别进行识别和评估。在与LLMChatBots和ChatGPT有关的记录中,那些对ChatGPT在学术界使用提出建议的建议被用于创建ChatGPT和LLM在学术医学中使用的指南声明.
    结果:共确定了87条记录。30条记录与大型语言模型无关,因此被排除在外。54条记录进行了全文审查以进行评估。有33条与LLMChatBots或ChatGPT相关的记录。
    结论:通过评估这些文本,开发了五个LLM使用指南声明:(1)ChatGPT/LLM不能在科学手稿中引用为作者;(2)如果考虑在学术工作中使用ChatGPT/LLM,author(s)shouldhaveatleastabasicunderstandingofwhatChatGPT/LLMis;(3)DonotuseChatGPT/LLMtoproduceentireoftextinmanscorcripts;humanmustbeheldaccountableforuseofChatGPT/LLMandcontentscreationscreatedshouldbesci
    结论:未来的作者应该注意他们的学术工作可能对医疗保健产生的潜在影响,并在使用ChatGPT/LLM时继续坚持最高的道德标准和诚信。
    There is currently no clear consensus on the standards for using large language models such as ChatGPT in academic medicine. Hence, we performed a scoping review of available literature to understand the current state of LLM use in medicine and to provide a guideline for future utilization in academia.
    A scoping review of the literature was performed through a Medline search on February 16, 2023 using a combination of keywords including artificial intelligence, machine learning, natural language processing, generative pre-trained transformer, ChatGPT, and large language model. There were no restrictions to language or date of publication. Records not pertaining to LLMs were excluded. Records pertaining to LLM ChatBots and ChatGPT were identified and evaluated separately. Among the records pertaining to LLM ChatBots and ChatGPT, those that suggest recommendations for ChatGPT use in academia were utilized to create guideline statements for ChatGPT and LLM use in academic medicine.
    A total of 87 records were identified. 30 records were not pertaining to large language models and were excluded. 54 records underwent a full-text review for evaluation. There were 33 records related to LLM ChatBots or ChatGPT.
    From assessing these texts, five guideline statements for LLM use was developed: (1) ChatGPT/LLM cannot be cited as an author in scientific manuscripts; (2) If use of ChatGPT/LLM are considered for use in academic work, author(s) should have at least a basic understanding of what ChatGPT/LLM is; (3) Do not use ChatGPT/LLM to produce entirety of text in manuscripts; humans must be held accountable for use of ChatGPT/LLM and contents created by ChatGPT/LLM should be meticulously verified by humans; (4) ChatGPT/LLMs may be used for editing and refining of text; (5) Any use of ChatGPT/LLM should be transparent and should be clearly outlined in scientific manuscripts and acknowledged.
    Future authors should remain mindful of the potential impact their academic work may have on healthcare and continue to uphold the highest ethical standards and integrity when utilizing ChatGPT/LLM.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

公众号