关键词: AI in health care artificial intelligence chatbots chronic hepatitis B cross-linguistic study large language models medical consultation

来  源:   DOI:10.2196/56426   PDF(Pubmed)

Abstract:
BACKGROUND: Chronic hepatitis B (CHB) imposes substantial economic and social burdens globally. The management of CHB involves intricate monitoring and adherence challenges, particularly in regions like China, where a high prevalence of CHB intersects with health care resource limitations. This study explores the potential of ChatGPT-3.5, an emerging artificial intelligence (AI) assistant, to address these complexities. With notable capabilities in medical education and practice, ChatGPT-3.5\'s role is examined in managing CHB, particularly in regions with distinct health care landscapes.
OBJECTIVE: This study aimed to uncover insights into ChatGPT-3.5\'s potential and limitations in delivering personalized medical consultation assistance for CHB patients across diverse linguistic contexts.
METHODS: Questions sourced from published guidelines, online CHB communities, and search engines in English and Chinese were refined, translated, and compiled into 96 inquiries. Subsequently, these questions were presented to both ChatGPT-3.5 and ChatGPT-4.0 in independent dialogues. The responses were then evaluated by senior physicians, focusing on informativeness, emotional management, consistency across repeated inquiries, and cautionary statements regarding medical advice. Additionally, a true-or-false questionnaire was employed to further discern the variance in information accuracy for closed questions between ChatGPT-3.5 and ChatGPT-4.0.
RESULTS: Over half of the responses (228/370, 61.6%) from ChatGPT-3.5 were considered comprehensive. In contrast, ChatGPT-4.0 exhibited a higher percentage at 74.5% (172/222; P<.001). Notably, superior performance was evident in English, particularly in terms of informativeness and consistency across repeated queries. However, deficiencies were identified in emotional management guidance, with only 3.2% (6/186) in ChatGPT-3.5 and 8.1% (15/154) in ChatGPT-4.0 (P=.04). ChatGPT-3.5 included a disclaimer in 10.8% (24/222) of responses, while ChatGPT-4.0 included a disclaimer in 13.1% (29/222) of responses (P=.46). When responding to true-or-false questions, ChatGPT-4.0 achieved an accuracy rate of 93.3% (168/180), significantly surpassing ChatGPT-3.5\'s accuracy rate of 65.0% (117/180) (P<.001).
CONCLUSIONS: In this study, ChatGPT demonstrated basic capabilities as a medical consultation assistant for CHB management. The choice of working language for ChatGPT-3.5 was considered a potential factor influencing its performance, particularly in the use of terminology and colloquial language, and this potentially affects its applicability within specific target populations. However, as an updated model, ChatGPT-4.0 exhibits improved information processing capabilities, overcoming the language impact on information accuracy. This suggests that the implications of model advancement on applications need to be considered when selecting large language models as medical consultation assistants. Given that both models performed inadequately in emotional guidance management, this study highlights the importance of providing specific language training and emotional management strategies when deploying ChatGPT for medical purposes. Furthermore, the tendency of these models to use disclaimers in conversations should be further investigated to understand the impact on patients\' experiences in practical applications.
摘要:
背景:慢性乙型肝炎(CHB)在全球范围内施加了巨大的经济和社会负担。CHB的管理涉及复杂的监测和依从性挑战,特别是在中国这样的地区,CHB的高患病率与医疗保健资源限制相交。这项研究探索了ChatGPT-3.5的潜力,这是一种新兴的人工智能(AI)助手,来解决这些复杂性。在医学教育和实践方面具有显着的能力,ChatGPT-3.5的角色在管理CHB中进行了检查,特别是在医疗保健景观不同的地区。
目的:本研究旨在揭示ChatGPT-3.5在不同语言环境中为CHB患者提供个性化医疗咨询援助方面的潜力和局限性。
方法:问题来自已发布的指南,在线CHB社区,英文和中文搜索引擎都很完善,翻译,并汇编成96项调查。随后,这些问题在独立对话中被提交给ChatGPT-3.5和ChatGPT-4.0.然后由资深医生评估反应,注重信息化,情绪管理,重复查询的一致性,和关于医疗建议的警告声明。此外,我们采用真假问卷进一步辨别ChatGPT-3.5和ChatGPT-4.0之间封闭式问题的信息准确性差异.
结果:来自ChatGPT-3.5的超过一半的反应(228/370,61.6%)被认为是全面的。相比之下,ChatGPT-4.0表现出更高的百分比,为74.5%(172/222;P<.001)。值得注意的是,在英语中表现优异,特别是在重复查询的信息性和一致性方面。然而,在情绪管理指导中发现了缺陷,ChatGPT-3.5中只有3.2%(6/186),ChatGPT-4.0中只有8.1%(15/154)(P=0.04)。ChatGPT-3.5在10.8%(24/222)的回复中包含免责声明,而ChatGPT-4.0在13.1%(29/222)的应答中包含免责声明(P=0.46)。当回答真假问题时,ChatGPT-4.0的准确率为93.3%(168/180),显著超过ChatGPT-3.5的准确率65.0%(117/180)(P<.001)。
结论:在这项研究中,ChatGPT展示了作为CHB管理医疗咨询助理的基本能力。ChatGPT-3.5的工作语言的选择被认为是影响其性能的潜在因素,特别是在使用术语和口语方面,这可能会影响其在特定目标人群中的适用性。然而,作为更新的模型,ChatGPT-4.0展示了改进的信息处理能力,克服语言对信息准确性的影响。这表明,在选择大型语言模型作为医疗咨询助手时,需要考虑模型进步对应用程序的影响。鉴于这两种模型在情绪指导管理中的表现都不充分,这项研究强调了在为医疗目的部署ChatGPT时提供特定语言训练和情绪管理策略的重要性.此外,应进一步调查这些模型在对话中使用免责声明的趋势,以了解在实际应用中对患者体验的影响。
公众号