关键词: Artificial intelligence ChatGPT China large language model medical education residency training

来  源:   DOI:10.1080/0142159X.2024.2377808

Abstract:
UNASSIGNED: The purpose of this study was to assess the utility of information generated by ChatGPT for residency education in China.
UNASSIGNED: We designed a three-step survey to evaluate the performance of ChatGPT in China\'s residency training education including residency final examination questions, patient cases, and resident satisfaction scores. First, 204 questions from the residency final exam were input into ChatGPT\'s interface to obtain the percentage of correct answers. Next, ChatGPT was asked to generate 20 clinical cases, which were subsequently evaluated by three instructors using a pre-designed Likert scale with 5 points. The quality of the cases was assessed based on criteria including clarity, relevance, logicality, credibility, and comprehensiveness. Finally, interaction sessions between 31 third-year residents and ChatGPT were conducted. Residents\' perceptions of ChatGPT\'s feedback were assessed using a Likert scale, focusing on aspects such as ease of use, accuracy and completeness of responses, and its effectiveness in enhancing understanding of medical knowledge.
UNASSIGNED: Our results showed ChatGPT-3.5 correctly answered 45.1% of exam questions. In the virtual patient cases, ChatGPT received mean ratings of 4.57 ± 0.50, 4.68 ± 0.47, 4.77 ± 0.46, 4.60 ± 0.53, and 3.95 ± 0.59 points for clarity, relevance, logicality, credibility, and comprehensiveness from clinical instructors, respectively. Among training residents, ChatGPT scored 4.48 ± 0.70, 4.00 ± 0.82 and 4.61 ± 0.50 points for ease of use, accuracy and completeness, and usefulness, respectively.
UNASSIGNED: Our findings demonstrate ChatGPT\'s immense potential for personalized Chinese medical education.
摘要:
本研究的目的是评估ChatGPT产生的信息对中国居民教育的效用。
我们设计了一项三步调查,以评估ChatGPT在中国住院医师培训教育中的表现,包括住院医师期末考试问题,患者病例,和居民满意度得分。首先,在ChatGPT的界面中输入了来自住院医师期末考试的204个问题,以获得正确答案的百分比。接下来,ChatGPT被要求产生20个临床病例,随后由三名讲师使用预先设计的5分Likert量表进行评估。根据包括清晰度在内的标准评估案件的质量,相关性,逻辑性,信誉,和全面性。最后,进行了31名三年级居民和ChatGPT之间的互动会议。居民对ChatGPT反馈的看法是使用李克特量表进行评估的,专注于易用性等方面,回答的准确性和完整性,及其在增强对医学知识的理解方面的有效性。
我们的结果显示ChatGPT-3.5正确回答了45.1%的考试问题。在虚拟病人病例中,ChatGPT的平均评分为4.57±0.50、4.68±0.47、4.77±0.46、4.60±0.53和3.95±0.59分,相关性,逻辑性,信誉,和临床指导员的全面性,分别。在培训住院医师中,ChatGPT得分为4.48±0.70、4.00±0.82和4.61±0.50分,便于使用,准确性和完整性,和有用性,分别。
我们的研究结果证明了ChatGPT在个性化中国医学教育方面的巨大潜力。
公众号