ChatGPT - 4 在日语中产生的临床插图的教育效用：混合方法研究。Educational Utility of Clinical Vignettes Generated in Japanese by ChatGPT-4: Mixed Methods Study.-医云文献数字医云科研云海量医学决策数据服务

Abstract：

BACKGROUND: Evaluating the accuracy and educational utility of artificial intelligence-generated medical cases, especially those produced by large language models such as ChatGPT-4 (developed by OpenAI), is crucial yet underexplored.
OBJECTIVE: This study aimed to assess the educational utility of ChatGPT-4-generated clinical vignettes and their applicability in educational settings.
METHODS: Using a convergent mixed methods design, a web-based survey was conducted from January 8 to 28, 2024, to evaluate 18 medical cases generated by ChatGPT-4 in Japanese. In the survey, 6 main question items were used to evaluate the quality of the generated clinical vignettes and their educational utility, which are information quality, information accuracy, educational usefulness, clinical match, terminology accuracy (TA), and diagnosis difficulty. Feedback was solicited from physicians specializing in general internal medicine or general medicine and experienced in medical education. Chi-square and Mann-Whitney U tests were performed to identify differences among cases, and linear regression was used to examine trends associated with physicians\' experience. Thematic analysis of qualitative feedback was performed to identify areas for improvement and confirm the educational utility of the cases.
RESULTS: Of the 73 invited participants, 71 (97%) responded. The respondents, primarily male (64/71, 90%), spanned a broad range of practice years (from 1976 to 2017) and represented diverse hospital sizes throughout Japan. The majority deemed the information quality (mean 0.77, 95% CI 0.75-0.79) and information accuracy (mean 0.68, 95% CI 0.65-0.71) to be satisfactory, with these responses being based on binary data. The average scores assigned were 3.55 (95% CI 3.49-3.60) for educational usefulness, 3.70 (95% CI 3.65-3.75) for clinical match, 3.49 (95% CI 3.44-3.55) for TA, and 2.34 (95% CI 2.28-2.40) for diagnosis difficulty, based on a 5-point Likert scale. Statistical analysis showed significant variability in content quality and relevance across the cases (P<.001 after Bonferroni correction). Participants suggested improvements in generating physical findings, using natural language, and enhancing medical TA. The thematic analysis highlighted the need for clearer documentation, clinical information consistency, content relevance, and patient-centered case presentations.
CONCLUSIONS: ChatGPT-4-generated medical cases written in Japanese possess considerable potential as resources in medical education, with recognized adequacy in quality and accuracy. Nevertheless, there is a notable need for enhancements in the precision and realism of case details. This study emphasizes ChatGPT-4\'s value as an adjunctive educational tool in the medical field, requiring expert oversight for optimal application.

摘要：

背景：评估人工智能生成的医疗案例的准确性和教育效用，特别是由ChatGPT-4（由OpenAI开发）等大型语言模型生成的模型，是至关重要的，但未被充分开发。
目的：本研究旨在评估ChatGPT-4生成的临床小插曲的教育效用及其在教育环境中的适用性。
方法：使用收敛混合方法设计，2024年1月8日至28日进行了一项基于网络的调查,以评估ChatGPT-4在日语中产生的18例医疗病例.在调查中,使用6个主要问题项目来评估生成的临床小插曲的质量及其教育效用，这是信息质量，信息准确性，教育有用性，临床匹配，术语准确性(TA)，和诊断困难。反馈是由专门从事普通内科或普通医学并且在医学教育方面经验丰富的医生征求的。进行卡方检验和Mann-WhitneyU检验以确定病例之间的差异，线性回归用于检查与医师经验相关的趋势。对定性反馈进行了主题分析，以确定需要改进的地方并确认案例的教育效用。
结果：在邀请的73名参与者中，71（97%）回答。受访者,主要是男性（64/71，90%），跨越广泛的实践年（从1976年到2017年），并代表了日本各地不同的医院规模。大多数人认为信息质量（平均0.77，95％CI0.75-0.79）和信息准确性（平均0.68，95％CI0.65-0.71）令人满意，这些响应基于二进制数据。教育有用性的平均分数为3.55(95%CI3.49-3.60)，临床匹配为3.70（95%CI3.65-3.75），TA的3.49（95%CI3.44-3.55），诊断难度为2.34（95%CI2.28-2.40），基于5分的李克特量表。统计学分析显示，不同病例的内容质量和相关性存在显著差异（Bonferroni校正后P<.001）。参与者建议改善身体发现，使用自然语言，增强医学TA。专题分析强调需要更清晰的文件，临床信息一致性，内容相关性，和以病人为中心的病例介绍。
结论：ChatGPT-4生成的日语医学案例作为医学教育资源具有相当大的潜力，在质量和准确性方面具有公认的充分性。然而，有一个显著的需要，以提高精度和真实性的情况下的细节。本研究强调了ChatGPT-4作为医学领域辅助教育工具的价值，需要专家监督才能实现最佳应用。