生成人工智能在牙科许可考试中的表现。Performance of Generative Artificial Intelligence in Dental Licensing Examinations.-医云文献数字医云科研云海量医学决策数据服务

Abstract：

OBJECTIVE: Generative artificial intelligence (GenAI), including large language models (LLMs), has vast potential applications in health care and education. However, it is unclear how proficient LLMs are in interpreting written input and providing accurate answers in dentistry. This study aims to investigate the accuracy of GenAI in answering questions from dental licensing examinations.
METHODS: A total of 1461 multiple-choice questions from question books for the US and the UK dental licensing examinations were input into 2 versions of ChatGPT 3.5 and 4.0. The passing rates of the US and UK dental examinations were 75.0% and 50.0%, respectively. The performance of the 2 versions of GenAI in individual examinations and dental subjects was analysed and compared.
RESULTS: ChatGPT 3.5 correctly answered 68.3% (n = 509) and 43.3% (n = 296) of questions from the US and UK dental licensing examinations, respectively. The scores for ChatGPT 4.0 were 80.7% (n = 601) and 62.7% (n = 429), respectively. ChatGPT 4.0 passed both written dental licensing examinations, whilst ChatGPT 3.5 failed. ChatGPT 4.0 answered 327 more questions correctly and 102 incorrectly compared to ChatGPT 3.5 when comparing the 2 versions.
CONCLUSIONS: The newer version of GenAI has shown good proficiency in answering multiple-choice questions from dental licensing examinations. Whilst the more recent version of GenAI generally performed better, this observation may not hold true in all scenarios, and further improvements are necessary. The use of GenAI in dentistry will have significant implications for dentist-patient communication and the training of dental professionals.

摘要：

目标：生成人工智能(GenAI)，包括大型语言模型(LLM)，在医疗保健和教育方面具有广泛的潜在应用。然而,目前尚不清楚LLM在解释书面输入和提供牙科准确答案方面的熟练程度.这项研究旨在调查GenAI在回答牙科许可考试问题时的准确性。
方法：在ChatGPT3.5和4.0的两个版本中输入了来自美国和英国牙科执照考试的问题簿中的1461道多项选择题。美英牙科检查合格率分别为75.0%和50.0%,分别。分析并比较了两种版本的GenAI在个人检查和牙科受试者中的表现。
结果：ChatGPT3.5正确回答了来自美国和英国牙科执照考试的68.3％（n=509）和43.3％（n=296）的问题，分别。ChatGPT4.0的得分分别为80.7%(n=601)和62.7%(n=429),分别。ChatGPT4.0通过了两项书面牙科执照考试，而ChatGPT3.5失败。比较2个版本时，与ChatGPT3.5相比，ChatGPT4.0正确回答了327个问题，错误回答了102个问题。
结论：较新版本的GenAI在回答牙科执照考试的多项选择题方面表现出良好的熟练程度。虽然最新版本的GenAI通常表现更好，这一观察可能并非在所有情况下都成立，需要进一步改进。GenAI在牙科中的使用将对牙医与患者的沟通和牙科专业人员的培训产生重大影响。