背景:国家整骨医学检查委员会(NBOME)管理美国(COMLEX-USA)的综合整骨医学执照考试,为获得骨科医学执照而设计的三级考试。COMLEX-USA3级(L3)的考试设计于2018年9月更改为为期两天的基于计算机的考试,其中包含两个组成部分:具有单个最佳答案的多项选择题(MCQ)组件和具有扩展多项选择(EMC)和简短答案(SA)问题的临床决策(CDM)案例组件。继续验证L3检查,尤其是新的设计,对于适当的解释和使用考试成绩至关重要。
目的:本研究的目的是利用基于Kane的有效性框架的证据来源,收集支持新设计下L3考试成绩有效性的证据。
方法:Kane\的有效性框架包含支持有效性论证的证据的四个组成部分:评分,概括,外推法,和暗示/决定。在这项研究中,我们从各种来源收集了数据,并进行了分析,以提供证据证明L3检查正在有效地测量它应该测量的内容.这些包括审查L3考试的内容覆盖范围,记录评分和报告过程,估计分数的可靠性和决策准确性/一致性,量化MCQ和CDM组成部分的分数之间以及L3考试不同能力领域的分数之间的关联,探索L3得分与基于绩效的评估得分之间的关系,该评估衡量相关结构,进行子组比较,并描述和证明标准参考的标准制定过程。分析数据包含2018年9月至2019年12月期间参加L3考试的8,366名候选人的首次尝试考试成绩。在这项研究中,基于绩效的评估是COMLEX-USA2级绩效评估(L2-PE)。
结果:所有评估表格均通过自动化测试组装(ATA)程序构建,以最大程度地提高表格中内容覆盖率和统计特性的并行性。评分和报告遵循行业标准的质量控制程序。SA评级的评级者间可靠性,决策准确性,以及通过/失败分类的决策一致性都非常高。在L3检查的MCQ和CDM分量之间存在统计上显著的正相关。联想的模式,在L3子分数内和L2-PE域分数下,符合正在测量的内容。按性别划分的亚组比较,种族,和第一语言在每个类别中的亚组之间的平均得分预期差异很小,并且得出的结果与文献中描述的结果一致.L3通过/失败标准是通过实施可辩护的标准参考程序来建立的。
结论:本研究为基于Kane的效度框架的L3考试提供了一些额外的效度证据。任何测量的有效性都必须通过对相关证据的持续评估来确定。NBOME将继续收集证据,以支持COMLEX-USA考试系列的有效性论点。
BACKGROUND: The National Board of Osteopathic Medical Examiners (NBOME) administers the Comprehensive Osteopathic Medical Licensing Examination of the United States (COMLEX-USA), a three-level examination designed for licensure for the practice of osteopathic medicine. The examination design for COMLEX-USA Level 3 (L3) was changed in September 2018 to a two-day computer-based examination with two components: a multiple-choice question (MCQ) component with single best answer and a clinical decision-making (CDM) case component with extended multiple-choice (EMC) and short answer (SA) questions. Continued validation of the L3 examination, especially with the new design, is essential for the appropriate interpretation and use of the test scores.
OBJECTIVE: The purpose of this
study is to gather evidence to support the validity of the L3 examination scores under the new design utilizing sources of evidence based on Kane\'s validity framework.
METHODS: Kane\'s validity framework contains four components of evidence to support the validity argument: Scoring, Generalization, Extrapolation, and Implication/Decision. In this
study, we gathered data from various sources and conducted analyses to provide evidence that the L3 examination is validly measuring what it is supposed to measure. These include reviewing content coverage of the L3 examination, documenting scoring and reporting processes, estimating the reliability and decision accuracy/consistency of the scores, quantifying associations between the scores from the MCQ and CDM components and between scores from different competency domains of the L3 examination, exploring the relationships between L3 scores and scores from a performance-based assessment that measures related constructs, performing subgroup comparisons, and describing and justifying the criterion-referenced standard setting process. The analysis data contains first-attempt test scores for 8,366 candidates who took the L3 examination between September 2018 and December 2019. The performance-based assessment utilized as a criterion measure in this
study is COMLEX-USA Level 2 Performance Evaluation (L2-PE).
RESULTS: All assessment forms were built through the automated test assembly (ATA) procedure to maximize parallelism in terms of content coverage and statistical properties across the forms. Scoring and reporting follows industry-standard quality-control procedures. The inter-rater reliability of SA rating, decision accuracy, and decision consistency for pass/fail classifications are all very high. There is a statistically significant positive association between the MCQ and the CDM components of the L3 examination. The patterns of associations, both within the L3 subscores and with L2-PE domain scores, fit with what is being measured. The subgroup comparisons by gender, race, and first language showed expected small differences in mean scores between the subgroups within each category and yielded findings that are consistent with those described in the literature. The L3 pass/fail standard was established through implementation of a defensible criterion-referenced procedure.
CONCLUSIONS: This
study provides some additional validity evidence for the L3 examination based on Kane\'s validity framework. The validity of any measurement must be established through ongoing evaluation of the related evidence. The NBOME will continue to collect evidence to support validity arguments for the COMLEX-USA examination series.