大型语言模型在急诊整形外科决策中的比较分析：身体检查数据的作用。Comparative Analysis of Large Language Models in Emergency Plastic Surgery Decision-Making: The Role of Physical Exam Data.-医云文献数字医云科研云海量医学决策数据服务

Abstract：

In the U.S., diagnostic errors are common across various healthcare settings due to factors like complex procedures and multiple healthcare providers, often exacerbated by inadequate initial evaluations. This study explores the role of Large Language Models (LLMs), specifically OpenAI\'s ChatGPT-4 and Google Gemini, in improving emergency decision-making in plastic and reconstructive surgery by evaluating their effectiveness both with and without physical examination data. Thirty medical vignettes covering emergency conditions such as fractures and nerve injuries were used to assess the diagnostic and management responses of the models. These responses were evaluated by medical professionals against established clinical guidelines, using statistical analyses including the Wilcoxon rank-sum test. Results showed that ChatGPT-4 consistently outperformed Gemini in both diagnosis and management, irrespective of the presence of physical examination data, though no significant differences were noted within each model\'s performance across different data scenarios. Conclusively, while ChatGPT-4 demonstrates superior accuracy and management capabilities, the addition of physical examination data, though enhancing response detail, did not significantly surpass traditional medical resources. This underscores the utility of AI in supporting clinical decision-making, particularly in scenarios with limited data, suggesting its role as a complement to, rather than a replacement for, comprehensive clinical evaluation and expertise.

摘要：

在美国,由于复杂的程序和多个医疗保健提供者等因素，诊断错误在各种医疗保健环境中很常见，往往因初步评估不足而加剧。本研究探讨了大型语言模型(LLM)的作用，特别是OpenAI的ChatGPT-4和谷歌双子座，通过评估有和没有体格检查数据的有效性来改善整形外科和重建外科的紧急决策。使用了30个涵盖骨折和神经损伤等紧急情况的医学小插曲来评估模型的诊断和管理响应。这些反应由医疗专业人员根据既定的临床指南进行评估，使用包括Wilcoxon秩和检验在内的统计分析。结果显示，ChatGPT-4在诊断和治疗方面始终优于双子座，不管体检数据的存在，尽管在不同的数据场景中，每个模型的性能没有显著差异。最后，虽然ChatGPT-4展示了卓越的准确性和管理能力，增加体检数据，虽然加强了反应细节，没有明显超越传统医学资源。这强调了人工智能在支持临床决策方面的效用，特别是在数据有限的情况下，暗示了它作为补充的作用，而不是替代，全面的临床评估和专业知识。