关键词: ChatGPT-4 Taiwan's medical licensing exams chain of thought healthcare system

来  源:   DOI:10.1177/20552076241237678   PDF(Pubmed)

Abstract:
UNASSIGNED: Taiwan is well-known for its quality healthcare system. The country\'s medical licensing exams offer a way to evaluate ChatGPT\'s medical proficiency.
UNASSIGNED: We analyzed exam data from February 2022, July 2022, February 2023, and July 2033. Each exam included four papers with 80 single-choice questions, grouped as descriptive or picture-based. We used ChatGPT-4 for evaluation. Incorrect answers prompted a \"chain of thought\" approach. Accuracy rates were calculated as percentages.
UNASSIGNED: ChatGPT-4\'s accuracy in medical exams ranged from 63.75% to 93.75% (February 2022-July 2023). The highest accuracy (93.75%) was in February 2022\'s Medicine Exam (3). Subjects with the highest misanswered rates were ophthalmology (28.95%), breast surgery (27.27%), plastic surgery (26.67%), orthopedics (25.00%), and general surgery (24.59%). While using \"chain of thought,\" the \"Accuracy of (CoT) prompting\" ranged from 0.00% to 88.89%, and the final overall accuracy rate ranged from 90% to 98%.
UNASSIGNED: ChatGPT-4 succeeded in Taiwan\'s medical licensing exams. With the \"chain of thought\" prompt, it improved accuracy to over 90%.
摘要:
台湾以其优质的医疗保健系统而闻名。该国的医疗许可考试提供了一种评估ChatGPT医疗水平的方法。
我们分析了2022年2月,2022年7月,2023年2月和2033年7月的考试数据。每次考试包括四篇论文和80个单选题,分组为描述性或基于图片的。我们使用ChatGPT-4进行评估。不正确的答案引发了“思想链”的方法。准确率以百分比计算。
ChatGPT-4在医学检查中的准确性范围为63.75%至93.75%(2022年2月至2023年7月)。准确率最高(93.75%)是在2022年2月的医学考试(3)中。误答率最高的受试者是眼科(28.95%),乳房手术(27.27%),整形外科(26.67%),骨科(25.00%),和普外科(24.59%)。在使用“思想链”时,“(CoT)提示的准确性范围从0.00%到88.89%,最终的总体准确率从90%到98%不等。
ChatGPT-4在台湾的医学执照考试中获得成功。随着“思想链”提示,精度提高到90%以上。
公众号