检查 ChatGPT 3.5 和 Microsoft Copilot 在耳鼻咽喉科的性能：与耳鼻喉科医师评估的比较研究。Examining the Performance of ChatGPT 3.5 and Microsoft Copilot in Otolaryngology: A Comparative Study with Otolaryngologists' Evaluation.-医云文献数字医云科研云海量医学决策数据服务

Abstract：

To evaluate the response capabilities, in a public healthcare system otolaryngology job competition examination, of ChatGPT 3.5 and an internet-connected GPT-4 engine (Microsoft Copilot) with the real scores of otolaryngology specialists as the control group. In September 2023, 135 questions divided into theoretical and practical parts were input into ChatGPT 3.5 and an internet-connected GPT-4. The accuracy of AI responses was compared with the official results from otolaryngologists who took the exam, and statistical analysis was conducted using Stata 14.2. Copilot (GPT-4) outperformed ChatGPT 3.5. Copilot achieved a score of 88.5 points, while ChatGPT scored 60 points. Both AIs had discrepancies in their incorrect answers. Despite ChatGPT\'s proficiency, Copilot displayed superior performance, ranking as the second-best score among the 108 otolaryngologists who took the exam, while ChatGPT was placed 83rd. A chat powered by GPT-4 with internet access (Copilot) demonstrates superior performance in responding to multiple-choice medical questions compared to ChatGPT 3.5.

摘要：

为了评估响应能力，在公共医疗系统耳鼻喉科工作竞争考试中，ChatGPT3.5和互联网连接的GPT-4引擎（MicrosoftCopilot），以耳鼻喉科专家的真实分数为对照组。2023年9月，将135个分为理论和实践部分的问题输入到ChatGPT3.5和连接互联网的GPT-4中。将AI反应的准确性与参加考试的耳鼻喉科医生的官方结果进行了比较，采用Stata14.2进行统计分析。副驾驶(GPT-4)的表现优于ChatGPT3.5。副驾驶取得88.5分的成绩,而ChatGPT得了60分。两个AI的错误答案都存在差异。尽管ChatGPT很熟练,Copilot表现出卓越的性能,在参加考试的108名耳鼻喉科医生中排名第二，而ChatGPT排在第83位。与ChatGPT3.5相比，由具有互联网访问功能的GPT-4（Copilot）提供的聊天在回答多项选择的医疗问题方面表现出卓越的性能。