关键词: ChatGPT KIDMAP RaschOnline Wright map application artificial intelligence college differential item functioning evaluation tool multiple choice questions scoring students testing tool website tool

来  源:   DOI:10.2196/46800   PDF(Pubmed)

Abstract:
BACKGROUND: ChatGPT (OpenAI), a state-of-the-art large language model, has exhibited remarkable performance in various specialized applications. Despite the growing popularity and efficacy of artificial intelligence, there is a scarcity of studies that assess ChatGPT\'s competence in addressing multiple-choice questions (MCQs) using KIDMAP of Rasch analysis-a website tool used to evaluate ChatGPT\'s performance in MCQ answering.
OBJECTIVE: This study aims to (1) showcase the utility of the website (Rasch analysis, specifically RaschOnline), and (2) determine the grade achieved by ChatGPT when compared to a normal sample.
METHODS: The capability of ChatGPT was evaluated using 10 items from the English tests conducted for Taiwan college entrance examinations in 2023. Under a Rasch model, 300 simulated students with normal distributions were simulated to compete with ChatGPT\'s responses. RaschOnline was used to generate 5 visual presentations, including item difficulties, differential item functioning, item characteristic curve, Wright map, and KIDMAP, to address the research objectives.
RESULTS: The findings revealed the following: (1) the difficulty of the 10 items increased in a monotonous pattern from easier to harder, represented by logits (-2.43, -1.78, -1.48, -0.64, -0.1, 0.33, 0.59, 1.34, 1.7, and 2.47); (2) evidence of differential item functioning was observed between gender groups for item 5 (P=.04); (3) item 5 displayed a good fit to the Rasch model (P=.61); (4) all items demonstrated a satisfactory fit to the Rasch model, indicated by Infit mean square errors below the threshold of 1.5; (5) no significant difference was found in the measures obtained between gender groups (P=.83); (6) a significant difference was observed among ability grades (P<.001); and (7) ChatGPT\'s capability was graded as A, surpassing grades B to E.
CONCLUSIONS: By using RaschOnline, this study provides evidence that ChatGPT possesses the ability to achieve a grade A when compared to a normal sample. It exhibits excellent proficiency in answering MCQs from the English tests conducted in 2023 for the Taiwan college entrance examinations.
摘要:
背景:ChatGPT(OpenAI),最先进的大型语言模型,在各种专业应用中表现出卓越的性能。尽管人工智能越来越受欢迎和有效,很少有研究使用Rasch分析的KIDMAP来评估ChatGPT解决多项选择题(MCQ)的能力,这是一种用于评估ChatGPT在MCQ回答中的表现的网站工具。
目的:本研究旨在(1)展示网站的实用性(Rasch分析,特别是RaschOnline),和(2)确定ChatGPT与正常样品相比所达到的等级。
方法:使用2023年台湾高考英语考试的10个项目评估了ChatGPT的能力。在Rasch模型下,对300名正态分布的模拟学生进行了模拟,以与ChatGPT的回答进行竞争。RaschOnline用于生成5个视觉演示,包括项目困难,差分项目功能,项目特性曲线,赖特地图,还有KIDMAP,为了实现研究目标。
结果:研究结果表明:(1)10个项目的难度以单调的方式从更容易到更难增加,用logits(-2.43,-1.78,-1.48,-0.64,-0.1,0.33,0.59,1.34,1.7和2.47)表示;(2)在第5项的性别组之间观察到不同项目功能的证据(P=.04);(3)第5项显示出与Rasch模型的良好拟合(P=.61);(4)由Infit均方误差低于阈值1.5表示;(5)性别组之间获得的测量结果没有显着差异(P=.83);(6)能力等级之间存在显着差异(P<.001);(7)ChatGPT的能力等级为A,超越等级B到E。
结论:通过使用RaschOnline,这项研究提供的证据表明,与正常样本相比,ChatGPT具有达到A级的能力。它在回答2023年台湾高考英语考试的MCQ方面表现出出色的能力。
公众号