关键词: AI Generative LLM Latarjet artificial intelligence chatbot large language model shoulder instability

来  源:   DOI:10.1016/j.arthro.2024.05.025

Abstract:
OBJECTIVE: To assess the ability of ChatGPT-4, an automated Chatbot powered by artificial intelligence, to answer common patient questions concerning the Latarjet procedure for patients with anterior shoulder instability and compare this performance with Google Search Engine.
METHODS: Using previously validated methods, a Google search was first performed using the query \"Latarjet.\" Subsequently, the top 10 frequently asked questions (FAQs) and associated sources were extracted. ChatGPT-4 was then prompted to provide the top 10 FAQs and answers concerning the procedure. This process was repeated to identify additional FAQs requiring discrete-numeric answers to allow for a comparison between ChatGPT-4 and Google. Discrete, numeric answers were subsequently assessed for accuracy on the basis of the clinical judgment of 2 fellowship-trained sports medicine surgeons who were blinded to search platform.
RESULTS: Mean (± standard deviation) accuracy to numeric-based answers was 2.9 ± 0.9 for ChatGPT-4 versus 2.5 ± 1.4 for Google (P = .65). ChatGPT-4 derived information for answers only from academic sources, which was significantly different from Google Search Engine (P = .003), which used only 30% academic sources and websites from individual surgeons (50%) and larger medical practices (20%). For general FAQs, 40% of FAQs were found to be identical when comparing ChatGPT-4 and Google Search Engine. In terms of sources used to answer these questions, ChatGPT-4 again used 100% academic resources, whereas Google Search Engine used 60% academic resources, 20% surgeon personal websites, and 20% medical practices (P = .087).
CONCLUSIONS: ChatGPT-4 demonstrated the ability to provide accurate and reliable information about the Latarjet procedure in response to patient queries, using multiple academic sources in all cases. This was in contrast to Google Search Engine, which more frequently used single-surgeon and large medical practice websites. Despite differences in the resources accessed to perform information retrieval tasks, the clinical relevance and accuracy of information provided did not significantly differ between ChatGPT-4 and Google Search Engine.
CONCLUSIONS: Commercially available large language models (LLMs), such as ChatGPT-4, can perform diverse information retrieval tasks on-demand. An important medical information retrieval application for LLMs consists of the ability to provide comprehensive, relevant, and accurate information for various use cases such as investigation about a recently diagnosed medical condition or procedure. Understanding the performance and abilities of LLMs for use cases has important implications for deployment within health care settings.
摘要:
目的:评估ChatGPT-4的能力,这是一种由人工智能(AI)驱动的自动化Chatbot,回答与前肩关节不稳定患者的Latarjet程序有关的常见患者问题,并将此性能与Google搜索引擎进行比较。
方法:使用先前验证的方法,首先使用查询“Latarjet”进行了Google搜索。“随后,提取了十大常见问题(FAQ)和相关来源.然后,ChatGPT-4被提示提供关于该程序的十大常见问题和答案。重复此过程以识别需要离散数字答案的其他常见问题解答,以便在ChatGPT-4和Google之间进行比较。离散,随后,根据两名研究金训练的运动医学外科医生对搜索平台不知情的临床判断,对数字答案的准确性进行了评估.
结果:ChatGPT-4对数字答案的平均(±标准偏差)准确度为2.9±0.9,而Google为2.5±1.4(p=0.65)。ChatGPT-4仅从学术来源获得答案的信息,这与谷歌搜索引擎(p=0.003)显著不同,仅使用30%的学术来源和网站来自个人外科医生(50%)和更大的医疗实践(20%)。对于一般常见问题,在比较ChatGPT-4和Google搜索引擎时,发现40%的常见问题解答是相同的。就用来回答这些问题的来源而言,ChatGPT-4再次使用了100%的学术资源,谷歌搜索引擎使用了60%的学术资源,20%的外科医生个人网站,和20%的医疗实践(p=0.087)。
结论:ChatGPT-4证明了响应患者询问提供有关Latarjet程序的准确可靠信息的能力,在所有情况下使用多个学术来源。这与谷歌搜索引擎相反,更频繁地使用单外科医生和大型医疗实践网站。尽管为执行信息检索任务而访问的资源存在差异,ChatGPT-4和GoogleSearchEngine的临床相关性和所提供信息的准确性无显著差异.
公众号