关键词: artificial hallucination artificial intelligence cardiac arrest cardiopulmonary resuscitation chatbot large language model

Mesh : Humans Resuscitation Language Communication Respiration Defibrillators

来  源:   DOI:10.1017/S1049023X23006568

Abstract:
BACKGROUND: Innovative large language model (LLM)-powered chatbots, which are extremely popular nowadays, represent potential sources of information on resuscitation for the general public. For instance, the chatbot-generated advice could be used for purposes of community resuscitation education or for just-in-time informational support of untrained lay rescuers in a real-life emergency.
OBJECTIVE: This study focused on assessing performance of two prominent LLM-based chatbots, particularly in terms of quality of the chatbot-generated advice on how to give help to a non-breathing victim.
METHODS: In May 2023, the new Bing (Microsoft Corporation, USA) and Bard (Google LLC, USA) chatbots were inquired (n = 20 each): \"What to do if someone is not breathing?\" Content of the chatbots\' responses was evaluated for compliance with the 2021 Resuscitation Council United Kingdom guidelines using a pre-developed checklist.
RESULTS: Both chatbots provided context-dependent textual responses to the query. However, coverage of the guideline-consistent instructions on help to a non-breathing victim within the responses was poor: mean percentage of the responses completely satisfying the checklist criteria was 9.5% for Bing and 11.4% for Bard (P >.05). Essential elements of the bystander action, including early start and uninterrupted performance of chest compressions with adequate depth, rate, and chest recoil, as well as request for and use of an automated external defibrillator (AED), were missing as a rule. Moreover, 55.0% of Bard\'s responses contained plausible sounding, but nonsensical guidance, called artificial hallucinations, that create risk for inadequate care and harm to a victim.
CONCLUSIONS: The LLM-powered chatbots\' advice on help to a non-breathing victim omits essential details of resuscitation technique and occasionally contains deceptive, potentially harmful directives. Further research and regulatory measures are required to mitigate risks related to the chatbot-generated misinformation of public on resuscitation.
摘要:
背景:创新的大型语言模型(LLM)驱动的聊天机器人,现在非常流行,代表公众复苏信息的潜在来源。例如,聊天机器人生成的建议可用于社区复苏教育或在现实生活中对未经训练的非专业救援人员提供及时的信息支持。
目的:本研究的重点是评估两个著名的基于LLM的聊天机器人的性能,特别是在聊天机器人生成的关于如何为无呼吸的受害者提供帮助的建议的质量方面。
方法:2023年5月,新的Bing(微软公司,美国)和巴德(谷歌有限责任公司,USA)聊天机器人被询问(每个n=20):\“如果有人没有呼吸怎么办?\”使用预先开发的清单评估了聊天机器人的内容,以符合2021年复苏委员会英国指南。
结果:两个聊天机器人都为查询提供了上下文相关的文本响应。然而,在回答中,与指南一致的对无呼吸患者的帮助说明的覆盖率很低:Bing和Bard完全满足检查表标准的回答的平均百分比分别为9.5%和11.4%(P>.05).旁观者行动的基本要素,包括早期开始和不间断地进行足够深度的胸部按压,rate,和胸部后坐力,以及对自动体外除颤器(AED)的要求和使用,通常是失踪的。此外,巴德55.0%的回答包含似是而非的声音,但是荒谬的指导,称为人工幻觉,这会给受害者带来护理不足和伤害的风险。
结论:LLM驱动的聊天机器人关于帮助无呼吸受害者的建议忽略了复苏技术的基本细节,偶尔包含欺骗性,潜在有害的指令。需要进一步的研究和监管措施来减轻与聊天机器人产生的公众关于复苏的错误信息相关的风险。
公众号