使用 RaschOnline 评估 ChatGPT 对多项选择题的能力：观察性研究。Assessing ChatGPT's Capability for Multiple Choice Questions Using RaschOnline: Observational Study.-医云文献数字医云科研云海量医学决策数据服务

Abstract：

BACKGROUND: ChatGPT (OpenAI), a state-of-the-art large language model, has exhibited remarkable performance in various specialized applications. Despite the growing popularity and efficacy of artificial intelligence, there is a scarcity of studies that assess ChatGPT\'s competence in addressing multiple-choice questions (MCQs) using KIDMAP of Rasch analysis-a website tool used to evaluate ChatGPT\'s performance in MCQ answering.
OBJECTIVE: This study aims to (1) showcase the utility of the website (Rasch analysis, specifically RaschOnline), and (2) determine the grade achieved by ChatGPT when compared to a normal sample.
METHODS: The capability of ChatGPT was evaluated using 10 items from the English tests conducted for Taiwan college entrance examinations in 2023. Under a Rasch model, 300 simulated students with normal distributions were simulated to compete with ChatGPT\'s responses. RaschOnline was used to generate 5 visual presentations, including item difficulties, differential item functioning, item characteristic curve, Wright map, and KIDMAP, to address the research objectives.
RESULTS: The findings revealed the following: (1) the difficulty of the 10 items increased in a monotonous pattern from easier to harder, represented by logits (-2.43, -1.78, -1.48, -0.64, -0.1, 0.33, 0.59, 1.34, 1.7, and 2.47); (2) evidence of differential item functioning was observed between gender groups for item 5 (P=.04); (3) item 5 displayed a good fit to the Rasch model (P=.61); (4) all items demonstrated a satisfactory fit to the Rasch model, indicated by Infit mean square errors below the threshold of 1.5; (5) no significant difference was found in the measures obtained between gender groups (P=.83); (6) a significant difference was observed among ability grades (P<.001); and (7) ChatGPT\'s capability was graded as A, surpassing grades B to E.
CONCLUSIONS: By using RaschOnline, this study provides evidence that ChatGPT possesses the ability to achieve a grade A when compared to a normal sample. It exhibits excellent proficiency in answering MCQs from the English tests conducted in 2023 for the Taiwan college entrance examinations.

摘要：

背景：ChatGPT（OpenAI），最先进的大型语言模型，在各种专业应用中表现出卓越的性能。尽管人工智能越来越受欢迎和有效，很少有研究使用Rasch分析的KIDMAP来评估ChatGPT解决多项选择题（MCQ）的能力，这是一种用于评估ChatGPT在MCQ回答中的表现的网站工具。
目的：本研究旨在（1）展示网站的实用性（Rasch分析，特别是RaschOnline)，和(2)确定ChatGPT与正常样品相比所达到的等级。
方法：使用2023年台湾高考英语考试的10个项目评估了ChatGPT的能力。在Rasch模型下,对300名正态分布的模拟学生进行了模拟，以与ChatGPT的回答进行竞争。RaschOnline用于生成5个视觉演示，包括项目困难，差分项目功能，项目特性曲线,赖特地图,还有KIDMAP,为了实现研究目标。
结果：研究结果表明：（1）10个项目的难度以单调的方式从更容易到更难增加，用logits（-2.43，-1.78，-1.48，-0.64，-0.1，0.33，0.59，1.34，1.7和2.47）表示；（2）在第5项的性别组之间观察到不同项目功能的证据（P=.04）；（3）第5项显示出与Rasch模型的良好拟合（P=.61）；（4）由Infit均方误差低于阈值1.5表示；（5）性别组之间获得的测量结果没有显着差异（P=.83）；（6）能力等级之间存在显着差异（P<.001）；（7）ChatGPT的能力等级为A，超越等级B到E。
结论：通过使用RaschOnline，这项研究提供的证据表明，与正常样本相比，ChatGPT具有达到A级的能力。它在回答2023年台湾高考英语考试的MCQ方面表现出出色的能力。