%0 Journal Article %T Evaluating the Potential of AI Chatbots in Treatment Decision-making for Acquired Bilateral Vocal Fold Paralysis in Adults. %A Dronkers EAC %A Geneid A %A Al Yaghchi C %A Lechien JR %J J Voice %V 0 %N 0 %D 2024 Apr 6 %M 38584026 %F 2.3 %R 10.1016/j.jvoice.2024.02.020 %X OBJECTIVE: The development of artificial intelligence-powered language models, such as Chatbot Generative Pre-trained Transformer (ChatGPT) or Large Language Model Meta AI (Llama), is emerging in medicine. Patients and practitioners have full access to chatbots that may provide medical information. The aim of this study was to explore the performance and accuracy of ChatGPT and Llama in treatment decision-making for bilateral vocal fold paralysis (BVFP).
METHODS: Data of 20 clinical cases, treated between 2018 and 2023, were retrospectively collected from four tertiary laryngology centers in Europe. The cases were defined as the most common or most challenging scenarios regarding BVFP treatment. The treatment proposals were discussed in their local multidisciplinary teams (MDT). Each case was presented to ChatGPT-4.0 and Llama Chat-2.0, and potential treatment strategies were requested. The Artificial Intelligence Performance Instrument (AIPI) treatment subscore was used to compare both Chatbots' performances to MDT treatment proposal.
RESULTS: Most common etiology of BVFP was thyroid surgery. A form of partial arytenoidectomy with or without posterior transverse cordotomy was the MDT proposal for most cases. The accuracy of both Chatbots was very low regarding their treatment proposals, with a maximum AIPI treatment score in 5% of the cases. In most cases even harmful assertions were made, including the suggestion of vocal fold medialisation to treat patients with stridor and dyspnea. ChatGPT-4.0 performed significantly better in suggesting the correct treatment as part of the treatment proposal (50%) compared to Llama Chat-2.0 (15%).
CONCLUSIONS: ChatGPT and Llama are judged as inaccurate in proposing correct treatment for BVFP. ChatGPT significantly outperformed Llama. Treatment decision-making for a complex condition such as BVFP is clearly beyond the Chatbot's knowledge expertise. This study highlights the complexity and heterogeneity of BVFP treatment, and the need for further guidelines dedicated to the management of BVFP.