关键词: Computerized adaptive testing Medical student Psychometrics Republic of Korea Statistical model

Mesh : Humans Educational Measurement / methods standards Republic of Korea Psychometrics / methods Students, Medical Computer Simulation Data Analysis Education, Medical, Undergraduate / methods Male Female

来  源:   DOI:10.3352/jeehp.2024.21.18

Abstract:
OBJECTIVE: This study aimed to compare and evaluate the efficiency and accuracy of computerized adaptive testing (CAT) under two stopping rules (SEM 0.3 and 0.25) using both real and simulated data in medical examinations in Korea.
METHODS: This study employed post-hoc simulation and real data analysis to explore the optimal stopping rule for CAT in medical examinations. The real data were obtained from the responses of 3rd-year medical students during examinations in 2020 at Hallym University College of Medicine. Simulated data were generated using estimated parameters from a real item bank in R. Outcome variables included the number of examinees\' passing or failing with SEM values of 0.25 and 0.30, the number of items administered, and the correlation. The consistency of real CAT result was evaluated by examining consistency of pass or fail based on a cut score of 0.0. The efficiency of all CAT designs was assessed by comparing the average number of items administered under both stopping rules.
RESULTS: Both SEM 0.25 and SEM 0.30 provided a good balance between accuracy and efficiency in CAT. The real data showed minimal differences in pass/fail outcomes between the 2 SEM conditions, with a high correlation (r = 0.99) between ability estimates. The simulation results confirmed these findings, indicating similar average item numbers between real and simulated data.
CONCLUSIONS: The findings suggest that both SEM 0.25 and 0.30 are effective termination criteria in the context of the Rasch model, balancing accuracy and efficiency in CAT.
摘要:
目的:本研究旨在使用韩国医学检查中的真实和模拟数据,比较和评估两种停止规则(SEM0.3和0.25)下的计算机自适应测试(CAT)的效率和准确性。
方法:本研究采用事后模拟和真实数据分析来探索医学检查中CAT的最佳停止规则。真实数据来自哈勒姆大学医学院2020年考试期间三年级医学生的反应。模拟数据是使用R中真实项目库的估计参数生成的。结果变量包括通过或失败的受试者数量,SEM值为0.25和0.30,管理的项目数,和相关性。通过基于0.0的切分检查通过或失败的一致性来评估真实CAT结果的一致性。通过比较两种停止规则下管理的物品的平均数量来评估所有CAT设计的效率。
结果:SEM0.25和SEM0.30均在CAT中提供了准确性和效率之间的良好平衡。实际数据显示,两种SEM条件之间的通过/失败结果差异最小,能力估计之间的相关性很高(r=0.99)。模拟结果证实了这些发现,表示真实数据和模拟数据之间相似的平均项目编号。
结论:研究结果表明,在Rasch模型的背景下,SEM0.25和0.30都是有效的终止标准,在CAT中平衡准确性和效率。
公众号