%0 Journal Article
%T Comparative Analysis of Large Language Models in Emergency Plastic Surgery Decision-Making: The Role of Physical Exam Data.
%A Borna S
%A Gomez-Cabello CA
%A Pressman SM
%A Haider SA
%A Forte AJ
%J J Pers Med
%V 14
%N 6
%D 2024 Jun 8
%M 38929832
%F 3.508
%R 10.3390/jpm14060612
%X In the U.S., diagnostic errors are common across various healthcare settings due to factors like complex procedures and multiple healthcare providers, often exacerbated by inadequate initial evaluations. This study explores the role of Large Language Models (LLMs), specifically OpenAI's ChatGPT-4 and Google Gemini, in improving emergency decision-making in plastic and reconstructive surgery by evaluating their effectiveness both with and without physical examination data. Thirty medical vignettes covering emergency conditions such as fractures and nerve injuries were used to assess the diagnostic and management responses of the models. These responses were evaluated by medical professionals against established clinical guidelines, using statistical analyses including the Wilcoxon rank-sum test. Results showed that ChatGPT-4 consistently outperformed Gemini in both diagnosis and management, irrespective of the presence of physical examination data, though no significant differences were noted within each model's performance across different data scenarios. Conclusively, while ChatGPT-4 demonstrates superior accuracy and management capabilities, the addition of physical examination data, though enhancing response detail, did not significantly surpass traditional medical resources. This underscores the utility of AI in supporting clinical decision-making, particularly in scenarios with limited data, suggesting its role as a complement to, rather than a replacement for, comprehensive clinical evaluation and expertise.