标准设置 Standard Setting-医云文献数字医云科研云海量医学决策数据服务

Standard Setting 关注

标准设置

文献(79篇)

百科

视频

1 Using ACGME General Surgery Milestones to Define the Competent Foundational Surgical Resident.

使用 ACGME 普通外科里程碑来定义胜任的基础外科居民。影响指数 : 3.524
发表时间：Jul 2024 14
来源期刊：J Surg Educ PMID：38749820

DOI：10.1016/j.jsurg.2024.03.016
文章类型： Journal Article

目标：在过渡到基于能力的外科培训时，明确定义能力的需要是至关重要的。本研究的目的是使用ACGME普外科里程碑作为我们的概念框架来定义精心准备的基础居民。
方法：参与者反映了他们对PGY1结束时做好充分准备的居民的期望，然后分配了反映普通外科里程碑1.0和2.0能力水平的里程碑水平。居民和教职员工的能力得分平均。充分准备的基础居民的水平是根据教师的一个标准偏差内的最高水平确定的，居民,和总组平均值。
方法：这发生在一次专门的教育务虚会上，大型学术普通外科住院医师计划。
方法：我们机构内的主要教师利益相关者和居民代表性样本（PGY1-5）参与。
结果：八名教师和五名居民完成了里程碑1.0和2.0评分。比较了教师和居民之间的平均得分。对于1.0，居民的基于实践的学习和改进3（PBLI3）和人际沟通技巧3（ICS3）的平均得分明显低于教师（PBLI31.3（0.3）v0.9（0.2），p=0.01；ICS31.6(0.6)v1.1(1)，p=0.01)。在所有亚能力领域，2.0的分数是可比的。有了这个广泛的协议，确定了基于里程碑的能力标准。为每个子能力创建了KSA的描述性叙述，结合确定的里程碑1.0和2.0级别。
结论：我们能够使用ACGME里程碑作为概念框架明确定义合格的基础居民。这些里程碑级别反映了我们部门的文化和期望，为建立评估计划提供基础。这种方法可以很容易地在其他程序中复制，以反映在更大的ACGME能力框架内对程序的特定期望。
OBJECTIVE: In transitioning to competency-based surgical training, the need to clearly define competency is paramount. The purpose of this study is to define the well-prepared foundational resident using the ACGME General Surgery Milestones as our conceptual framework.
METHODS: Participants reflected on their expectations of a well-prepared resident at the end of PGY1, then assigned milestone levels reflecting this level of competence for General Surgery Milestones 1.0 and 2.0. Subcompetency scores were averaged among residents and faculty. The level of the well-prepared foundational resident was determined based on the highest level within one standard deviation of faculty, resident, and total group averages.
METHODS: This took place during a dedicated education retreat at a single, large academic general surgery residency program.
METHODS: Key faculty stakeholders and a representative sample of residents (PGY 1-5) within our institution participated.
RESULTS: Eight faculty and five residents completed Milestones 1.0 and 2.0 scoring. Mean scores between faculty and residents were compared. For 1.0, mean scores for Practice-Based Learning and Improvement 3 (PBLI 3) and Interpersonal Communication Skills 3 (ICS 3) were discernably lower for residents than for faculty (PBLI 3 1.3 (0.3) v 0.9 (0.2), p = 0.01; ICS3 1.6 (0.6) v 1.1 (1), p = 0.01). Scores of 2.0 were comparable across all subcompetency domains. With this broad agreement, Milestone-based competency standards were determined. Descriptive narratives of the KSAs were created for each subcompetency, combining the determined Milestones 1.0 and 2.0 levels.
CONCLUSIONS: We were able to clearly define the competent foundational resident using the ACGME Milestones as a conceptual framework. These Milestone levels reflect the culture and expectations in our department, providing a foundation upon which to build a program of assessment. This methodology can be readily replicated in other programs to reflect specific expectations of the program within the larger ACGME frameworks of competency.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

求助全文
2 Implementation of standard setting for high-stakes objective structured clinical examinations.

实施高风险客观结构化临床检查的标准制定。影响指数 : 暂无
发表时间：06 2024 5
来源期刊：Curr Pharm Teach Learn PMID：38582641

DOI：10.1016/j.cptl.2024.03.008
文章类型： Journal Article

目的：描述一个机构将高风险客观结构临床检查（OSCE）从规范参考转变为标准参考标准的方法，并评估这些变化对OSCE绩效和通过率的影响。
方法：该学院的OSCE写作团队选择了一种适用于高风险评估的改良Angoff方法，以取代以前使用的两种标准偏差方法。OSCE编写小组的每个成员都独立审查了分析清单，并计算了OSCE上活动站的及格分数。然后，小组开会，确定每个站点的最终通过分数。该小组还确定了每个车站的关键切点，当指示。在OSCEs管理后，分数,通过率，并将补救需求与以前的标准参考方法进行了比较。使用描述性统计来总结数据。
结果：当切换到标准参考方法时，OSCE分数保持相对不变，但是补救者的数量增加了2.6倍。第一年,平均得分从86.8%增加到91.7%，修复率从2.8%增加到7.4%。第三年,平均从90.9%增加到92%，而修复率从6%增加到15.6%。同样，第四年的平均值从84.9%增加到87.5%，而补救率从4.4%增加到9%。
结论：过渡到改良的Angoff方法并不影响OSCE的平均得分，但确实增加了治疗的次数。
To describe one institution\'s approach to transformation of high-stakes objective structure clinical examinations (OSCEs) from norm-referenced to criterion-referenced standards setting and to evaluate the impact of these changes on OSCE performance and pass rates.
The OSCE writing team at the college selected a modified Angoff method appropriate for high-stakes assessments to replace the two standard deviation method previously used. Each member of the OSCE writing team independently reviewed the analytical checklist and calculated a passing score for active stations on OSCEs. Then the group met to determine a final pass score for each station. The team also determined critical cut points for each station, when indicated. After administration of the OSCEs, scores, pass rates, and need for remediation were compared to the previous norm-referenced method. Descriptive statistics were used to summarize the data.
OSCE scores remained relatively unchanged when switched to a criterion-referenced method, but the number of remediators increased up to 2.6 fold. In the first year, the average score increased from 86.8% to 91.7% while the remediation rate increased from 2.8% to 7.4%. In the third year, the average increased from 90.9% to 92% while the remediation rate increased from 6% to 15.6%. Likewise, the fourth-year average increased from 84.9% to 87.5% while the remediation rate increased from 4.4% to 9%.
Transition to a modified Angoff method did not impact average OSCE score but did increase the number of remediations.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

求助全文
3 Standard setting anchor statements: a double cross-over trial of two different methods.

标准定锚陈述：两种不同方法的双重交叉试验。影响指数 : 暂无
发表时间：2021
来源期刊：MedEdPublish (2016) PMID：38486524

DOI：10.15694/mep.2021.000032.1
文章类型： Journal Article

本文已迁移。这篇文章被标记为推荐。背景：我们挑战Angoff方法的哲学可接受性，并根据考生了解每个测试项目评估的材料的重要性，提出一种替代的标准设定方法，而不是一组候选人回答每个项目有多困难。方法：这里评估了另一种标准制定方法的实用性，第一次,与Angoff方法直接比较。为了消除任何主导效应造成的偏见，采用了两组法官(n=7和n=8)的前瞻性交叉设计，两者都为相同的两个100项多项选择题测试设定了标准，通过两种不同的方法。结果：总体而言，我们发现这两种方法花费了相似的时间来完成。替代方法产生了更高的切分(12-14%)，法官之间的差异程度更高(5%)。当使用替代方法时，法官报告了一个小的，但具有统计学意义，增加他们准确决定标准的信心(3%)。结论：这是一种新的标准设置方法，其中数量差异很小，但是使用替代方法有明显的质量优势。
This article was migrated. The article was marked as recommended. Context: We challenge the philosophical acceptability of the Angoff method, and propose an alternative method of standard setting based on how important it is for candidates to know the material each test item assesses, and not how difficult it is for a subgroup of candidates to answer each item. Methods: The practicalities of an alternative method of standard setting are evaluated here, for the first time, with direct comparison to an Angoff method. To negate bias due to any leading effects, a prospective cross-over design was adopted involving two groups of judges (n=7 and n=8), both of which set the standards for the same two 100 item multiple choice question tests, by the two different methods. Results: Overall, we found that the two methods took a similar amount of time to complete. The alternative method produced a higher cut-score (by 12-14%), and had a higher degree of variability between judges\' cut-scores (by 5%). When using the alternative method, judges reported a small, but statistically significant, increase in their confidence to decide accurately the standard (by 3%). Conclusion: This is a new approach to standard setting where the quantitative differences are slight, but there are clear qualitative advantages associated with use of the alternative method.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)
4 PRO-Angoff method for remote standard setting: establishing clinical thresholds for the upper digestive disease tool.

用于远程标准设置的 PRO - Angoff 方法：建立上消化道疾病工具的临床阈值。影响指数 : 暂无
发表时间：Mar 2024 12
来源期刊：J Patient Rep Outcomes PMID：38472561

DOI：10.1186/s41687-024-00707-x
文章类型： Journal Article

背景：上消化道疾病（UDD）工具™用于监测症状频率，强度,和干扰九个症状领域，包括两个评估身体和心理健康的患者报告结果测量信息系统（PROMIS）领域。本研究旨在通过标准设置练习为更新的症状领域建立切分，并评估虚拟标准设置的有效性和可接受性。
方法：采用扩展Angoff方法确定切割分数。主题专家完善了症状控制类别的性能描述，并达成了共识。域名被归类为好，中度,症状控制不佳。建立了两个削减分数，区分好与中度和中度vs.穷。小组成员估计每个项目100名临界患者的平均得分。剪切分数是根据单个问题的平均评分总和计算的，转换为0-100刻度。
结果：改进了性能描述。小组成员讨论了分数的解释应考虑到手术后症状的时间和患者群体，以及询问症状频率的项目的重要性，严重程度,干扰日常生活。各领域的良好/中等切分范围为21.3至35.0（平均28.6，SD3.6），中等/较差的范围为47.5至71.3（平均54.5，SD7.0）。
结论：小组成员对虚拟标准制定过程充满信心，期待有效的削减分数。未来的研究可以使用患者观点进一步验证切割分数，并收集患者和医生偏好，以在面向患者和医生的仪表板上显示上下文项目。
BACKGROUND: The Upper Digestive Disease (UDD) Tool™ is used to monitor symptom frequency, intensity, and interference across nine symptom domains and includes two Patient-Reported Outcome Measurement Information System (PROMIS) domains assessing physical and mental health. This study aimed to establish cut scores for updated symptom domains through standard setting exercises and evaluate the effectiveness and acceptability of virtual standard setting.
METHODS: The extended Angoff method was employed to determine cut scores. Subject matter experts refined performance descriptions for symptom control categories and achieved consensus. Domains were categorized into good, moderate, and poor symptom control. Two cut scores were established, differentiating good vs. moderate and moderate vs. poor. Panelists estimated average scores for 100 borderline patients per item. Cut scores were computed based on the sum of the average ratings for individual questions, converted to 0-100 scale.
RESULTS: Performance descriptions were refined. Panelists discussed that interpretation of the scores should take into account the timing of symptoms after surgery and patient populations, and the importance of items asking symptom frequency, severity, and interference with daily life. The good/moderate cut scores ranged from 21.3 to 35.0 (mean 28.6, SD 3.6) across domains, and moderate/poor ranged from 47.5 to 71.3 (mean 54.5, SD 7.0).
CONCLUSIONS: Panelists were confident in the virtual standard setting process, expecting valid cut scores. Future studies can further validate the cut scores using patient perspectives and collect patient and physician preferences for displaying contextual items on patient- and physician-facing dashboard.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)
5 Angoff anchor statements: setting a flawed gold standard?

Angoff 主播声明：设置有缺陷的黄金标准？影响指数 : 暂无
发表时间：2017
来源期刊：MedEdPublish (2016) PMID：38406395

DOI：10.15694/mep.2017.000167
文章类型： Journal Article

本文已迁移。这篇文章被标记为推荐。Angoff标准设置方法从根本上取决于锚语句的概念化。锚陈述的确切措辞和随之而来的解释在实践中各不相同。通常将重点放在标准设置法官对候选子组的难度的看法上。当前的审查集中在锚语句的含义，并认为在确定所需的性能标准时，更适合考虑：（1）重要的是要实现，而不是实现它有多难；(2)所有候选人都应该实现什么，而不是一个候选人小组会取得什么成就。总之,当前的做法应该通过使用锚语句来完善，锚语句指的是估计每个候选人对每个被测试项目的最低可接受性能，然后要求每个法官对重要性的相关方面进行评分，然后可以将其组合以得出切分。
This article was migrated. The article was marked as recommended. The Angoff standard setting method depends fundamentally on the conceptualisation of an anchor statement. The precise wording and consequent interpretation of anchor statements varies in practice. Emphasis is often placed on standard setting judges\' perceptions of difficulty for a candidate subgroup. The current review focusses on the meaning of anchor statements and argues that when determining the required standard of performance it is more appropriate to consider: (1) what it is important to achieve, and not how difficult it is to achieve it; (2) what all candidates should achieve, and not what a subgroup of candidates would achieve. In summary, current practice should be refined by using an anchor statement which refers to estimating the \'minimum acceptable performance by every candidate\' for each item being tested, and then requiring each judge to score the relevant aspects of importance which could then be combined to derive a cut-score.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)
6 The Need for Standardization, Reliability and Validity in Fundamental Roots for a Successful Problem-Based Learning Program.

标准化的必要性,成功的基于问题的学习计划的基本根源的可靠性和有效性。影响指数 : 暂无
发表时间：2019
来源期刊：MedEdPublish (2016) PMID：38089309

DOI：10.15694/mep.2019.000093.1
文章类型： Systematic Review

本文已迁移。这篇文章被标记为推荐。半个多世纪前，麦克马斯特大学在医学教育中引入了基于问题的学习（PBL）。从那以后,数百篇评论和研究报告确定了许多影响PBL部门成功的关键问题。尽管如此，我们仍在争论PBL计划的有效性和成功。全球超过一半的医学院在其医学教育计划中引入了各种版本的PBL教学法，实现了对结果的各种修改。在本文中,我已经从许多学者和通过他们的出版物进行了审查;我已经确定了成功的PBL计划的八个重要的基本根源。任何PBL计划的成功都必须从标准化的角度进行整体评估，可靠性和有效性在这些基本根之间同步工作，而不是单个PBL单元。标准化的重要性必须考虑许多审查和研究报告中确定的所有关键问题。这些问题是许多因素之一，当纳入PBL计划的基本框架时，将以统一的结果规范所有PBL单位。教育目标原则将通过满足机构使命和学生的职业成功目标来指导PBL计划的可靠性。“作为学习的评估”应包含“整体和不同的方法”和纵向“进度测试”。这些是在成功的PBL计划中实现可靠和有效的结果评估的主要评估方法。
This article was migrated. The article was marked as recommended. McMaster University has introduced the Problem-Based Learning (PBL) in medical education over half a century ago. Since then, hundreds of reviews and study reports have identified many critical issues affecting the success of the PBL unit. Nonetheless, we are still debating the efficacy and success of the PBL program. Over half of all medical schools globally have introduced various versions of PBL pedagogy in their medical education program achieving assorted modifications of outcomes. In this paper, I have reviewed from many scholars and through their publications; I have identified eight important Fundamental Roots for a successful PBL program. The success of any PBL program must be evaluated as a whole from the perspective of Standardization, Reliability and Validity working synchronously between these fundamental roots and not the individual PBL unit. The importance of Standardization must consider all the critical issues identified in many reviews and study reports. These issues are one of many factors when incorporated into the fundamental framework of a PBL program will regulate all PBL units in a unified outcome. The educational objective principles will guide the reliability of a PBL program by meeting the institutional mission and students\' career success goals. The \"Assessment as Learning\" should incorporate the \"Holistic and Divergent Approach\" and the longitudinal \"Progress Testing\". These are the principle methods of evaluation to achieve a reliable and valid outcome assessment in a successful PBL program.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)
7 Competency-based Standard Setting for a High-stakes Objective Structured Clinical Examination (OSCE): Validity Evidence.

高风险客观结构化临床考试 (OSCE) 的基于能力的标准制定：有效性证据。影响指数 : 暂无
发表时间：2018
来源期刊：MedEdPublish (2016) PMID：38074586

DOI：10.15694/mep.2018.0000200.1
文章类型： Journal Article

本文已迁移。这篇文章被标记为推荐。引言医学教育工作者需要证明他们的受训者在进行医学教育时达到预期的能力水平。这项研究旨在为毕业所需的客观结构化临床考试（OSCE）制定基于能力的及格/不及格分数，并检查新标准的有效性证据。方法六名临床医生使用改良的Angoff方法确定8站OSCE的切分。临床医生估计了能正确回答每个检查表项目的最低能力学生的百分比。评估者间可靠性，通过/失败组之间其他学术成就的差异，教育影响，并对反应过程进行了检查。结果一百七十四名正在上升的四年级医学生参加了欧安组织。为OSCE确定的削减分数导致失败率大大降低(5%与前一年的29%）。跨领域和案例的评分者间可靠性为.98（95%CI=.97-.99）。在研究中包括的八项学术成就指标中，通过/失败组的六项存在显着差异。讨论标准设置的影响是巨大的，因为它大大降低了学生和教师的失败率和补救负担。非常高的评分者间可靠性表明，改进的Angoff方法产生了可靠的切分。其他措施中通过/失败组之间的显着差异支持标准的外部有效性，并确保没有错误通过。该研究还通过包括法官之间的讨论和检查以前的学生表现来支持响应过程的有效性，以及招募和培训多名在医学生教学方面经验丰富的临床医生教育者。结论该研究的结果提供了强有力的证据，支持来自广泛的有效性指标的新削减分数的有效性，包括响应过程，内部结构,与其他变量的关系，和后果。该研究还在文献中增加了改进的Angoff方法在确定基于能力的OSCE标准中的价值。
This article was migrated. The article was marked as recommended. Introduction Medical educators need to demonstrate that their trainees meet expected competency levels when progressing through medical education. This study aimed to develop competency-based pass/fail cut-scores for a graduation required Objective Structured Clinical Examination (OSCE), and examine validity evidence for new standards. Methods Six clinicians used the modified Angoff method to determine the cut-scores for an 8-station OSCE. The clinicians estimated the percentage of minimally competent students who would answer each checklist item correctly. Inter-rater reliability, differences in other academic achievements between pass/fail groups, educational impact, and response process were examined. Results One hundred seventy-four rising 4th-year medical students participated in the OSCE. The cut-scores determined for the OSCE resulted in a substantially lower failure rate (5% vs. 29% of the previous year). The inter-rater reliability across domains and cases was .98 (95% CI = .97 - .99). The pass/fail groups significantly differed in six of the eight measures of academic achievements included in the study. Discussion The impact of the standards setting was substantial as it significantly reduced the failure rate and burdens of remediation for both students and faculty. The very high inter-rater reliability indicates that the modified Angoff method produced reliable cut-scores. The significant differences between the pass/fail groups in other measures support external validity of the standards and ensure no false passes. The study also supports response process validity by including discussion among judges and check of previous student performances, as well as recruiting and training multiple clinician educators experienced in medical student teaching. Conclusion Findings of the study provide strong evidence supporting validity of the new cut-scores from a wide spectrum of validity metrics, including response process, internal structure, relations to other variables, and consequences. The study also added to the literature the value of the modified Angoff method in determining competency-based standards for OSCEs.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)
8 Towards a more nuanced conceptualisation of differential examiner stringency in OSCEs.

朝着 OSCE 中差异审查员严格性的更细微的概念化。影响指数 : 3.629
发表时间：Jul 2024 16
来源期刊：Adv Health Sci Educ Theory Pract PMID：37843678

DOI：10.1007/s10459-023-10289-w
文章类型： Journal Article

OSCE评分在考官之间的系统性差异的定量度量（通常称为考官严格性）可能会威胁到考试结果的有效性。这种效果通常是概念化和操作的，仅基于站中的检查表/领域分数，和全局等级并不经常用于这种类型的分析。在这项工作中,分析了一个大型的候选人水平的考试数据集，以发展对考官严格性的更复杂的理解。电台分数是根据全球成绩建模的-每个候选人，车站和考官允许在建模中改变他们的能力/严格性/难度。此外,据我们所知，考官也被允许在不同年级的区分方式上有所不同，这是第一次调查。结果表明，考官以两种不同的方式对得分差异做出了很大贡献-通过传统的得分严格性概念（得分差异的34％），而且在他们如何区分不同年级的得分(7%)。正如人们所期望的，一旦考虑了候选等级（分别为3％和2％），其余为残差（54％），则候选和站点仅在站点级别上占少量得分差异。对站点级别候选人通过/失败决定的影响的调查表明，审查员的差异严格效应结合在一起，在站点中分别产生约5％的假阳性（候选人错误通过）和假阴性（错误失败）率，但在考试级别，这分别降低到0.4％和3.3％。这项工作通过证明审查员的判断可以在质量上不同的方式来增加我们对审查员行为的理解。对于机构，它强调了一个关键信息，即重要的是通过足够的站点从审查员池中进行广泛采样，以确保OSCE一级的决定具有足够的可辩护性。它还建议考官培训应包括对全球评分的讨论，以及评分和分级对候选结局的综合影响。
Quantitative measures of systematic differences in OSCE scoring across examiners (often termed examiner stringency) can threaten the validity of examination outcomes. Such effects are usually conceptualised and operationalised based solely on checklist/domain scores in a station, and global grades are not often used in this type of analysis. In this work, a large candidate-level exam dataset is analysed to develop a more sophisticated understanding of examiner stringency. Station scores are modelled based on global grades-with each candidate, station and examiner allowed to vary in their ability/stringency/difficulty in the modelling. In addition, examiners are also allowed to vary in how they discriminate across grades-to our knowledge, this is the first time this has been investigated. Results show that examiners contribute strongly to variance in scoring in two distinct ways-via the traditional conception of score stringency (34% of score variance), but also in how they discriminate in scoring across grades (7%). As one might expect, candidate and station account only for a small amount of score variance at the station-level once candidate grades are accounted for (3% and 2% respectively) with the remainder being residual (54%). Investigation of impacts on station-level candidate pass/fail decisions suggest that examiner differential stringency effects combine to give false positive (candidates passing in error) and false negative (failing in error) rates in stations of around 5% each but at the exam-level this reduces to 0.4% and 3.3% respectively. This work adds to our understanding of examiner behaviour by demonstrating that examiners can vary in qualitatively different ways in their judgments. For institutions, it emphasises the key message that it is important to sample widely from the examiner pool via sufficient stations to ensure OSCE-level decisions are sufficiently defensible. It also suggests that examiner training should include discussion of global grading, and the combined effect of scoring and grading on candidate outcomes.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)
9 Automated Item Generation: impact of item variants on performance and standard setting.

自动项目生成：项目变量对性能和标准设置的影响。影响指数 : 3.263
发表时间：Sep 2023 11
来源期刊：BMC Med Educ PMID：37697275

DOI：10.1186/s12909-023-04457-0
文章类型： Journal Article

背景：自动项目生成（AIG）使用计算机软件从单个问题模型中创建多个项目。目前缺乏关于单个问题的项目变体是否会导致学生表现或人为标准设置的差异的数据。这项研究的目的是使用50多个选择题（MCQs）作为模型来创建四个不同的测试，这些测试将是标准设置并提供给最后一年的英国医学生，然后比较各自的性能和标准设置数据。
方法：来自英国医学院理事会（MSC）评估联盟项目库的预先存在的问题，使用传统的项目写作技术创建，使用AIG软件生成四个“同构”50项MCQ测试。同构问题使用相同的问题模板进行细微的改动来测试相同的学习结果。所有英国医学院都被邀请发表四篇论文中的一篇，作为他们最后一年学生的在线形成性评估。每个测试都是使用改进的Angoff方法的标准集。对设施（学生表现）和平均分数（标准设置）具有高低差异的项目变体进行了主题分析。
结果：来自英国12所医学院的两千两百十八名学生参加了比赛，每个学校都使用四篇论文中的一篇。四篇论文的平均设备范围为0.55-0.61，剪切分数范围为0.58-0.61。20个项目模型的设施差异>0.15，10个项目模型的标准设置差异>0.1。可能改变临床推理策略的参数变化对项目设施的影响最大。
结论：项目设施的变化程度大于标准集。与专家相比，这种差异可能与导致新手学习者临床推理策略中断更大的变体有关。但由于表现差异可能在学校层面得到解释，因此值得进一步研究。
BACKGROUND: Automated Item Generation (AIG) uses computer software to create multiple items from a single question model. There is currently a lack of data looking at whether item variants to a single question result in differences in student performance or human-derived standard setting. The purpose of this study was to use 50 Multiple Choice Questions (MCQs) as models to create four distinct tests which would be standard set and given to final year UK medical students, and then to compare the performance and standard setting data for each.
METHODS: Pre-existing questions from the UK Medical Schools Council (MSC) Assessment Alliance item bank, created using traditional item writing techniques, were used to generate four \'isomorphic\' 50-item MCQ tests using AIG software. Isomorphic questions use the same question template with minor alterations to test the same learning outcome. All UK medical schools were invited to deliver one of the four papers as an online formative assessment for their final year students. Each test was standard set using a modified Angoff method. Thematic analysis was conducted for item variants with high and low levels of variance in facility (for student performance) and average scores (for standard setting).
RESULTS: Two thousand two hundred eighteen students from 12 UK medical schools participated, with each school using one of the four papers. The average facility of the four papers ranged from 0.55-0.61, and the cut score ranged from 0.58-0.61. Twenty item models had a facility difference > 0.15 and 10 item models had a difference in standard setting of > 0.1. Variation in parameters that could alter clinical reasoning strategies had the greatest impact on item facility.
CONCLUSIONS: Item facility varied to a greater extent than the standard set. This difference may relate to variants causing greater disruption of clinical reasoning strategies in novice learners compared to experts, but is confounded by the possibility that the performance differences may be explained at school level and therefore warrants further study.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)
10 Making progress: A national study on the development and use of progression assessments in United States pharmacy curricula.

取得进展：一项关于美国药学课程中进展评估的开发和使用的国家研究。影响指数 : 暂无
发表时间：04 2023 24
来源期刊：Curr Pharm Teach Learn PMID：37100726

DOI：10.1016/j.cptl.2023.04.003
文章类型： Journal Article

背景：目的是确定利用率，频率,特点,和药学教育进展评估的标准制定方法。
方法：对139所美国药学院/学校进行了一项调查，这些学校/学院具有可识别的评估线索，并注册了药学博士课程。调查检查了程序的使用，频率,以及课程中进展评估的特点。受访者还报告了由于COVID-19大流行而做出的任何变化，如果有的话,将在未来几年保持。分析包括描述性统计和主题编码。这项研究被大学的机构审查委员会视为豁免。
结果：78个项目对调查做出了回应（回应率=56%）。67%的项目在2019-2020年至少进行了一次进展评估。评估实践中有一些可变性，包括管理的专业年份，涉及的课程，和内容。大约75％的课程使用评估来确保学生在课程学习成果中的能力，并确定学生的学习不足。在有效性和可靠性实践中看到了多样性，和大多数程序使用预先确定的削减分数没有正式的标准设置。由于大流行，75%的项目改变了评估交付模式，20个项目计划在未来迭代中保持至少一次与大流行相关的变化。
结论：大多数药学课程在其课程中使用某种类型的进展评估。虽然许多学校管理进展评估，他们的目的几乎没有一致意见，发展,和使用。大流行改变了分娩方式，许多项目将在未来继续。
The objective was to determine the utilization, frequency, characteristics, and standard-setting methods of progression assessments in pharmacy education.
A survey was sent to 139 United States schools/colleges of pharmacy having an identifiable assessment lead and students enrolled in the doctor of pharmacy program. The survey examined programs\' use, frequency, and characteristics of progression assessments within their curriculum. Respondents also reported any changes made due to the COVID-19 pandemic and which, if any, would be maintained in future years. Analysis consisted of descriptive statistics and thematic coding. This research was deemed exempt by the university\'s institutional review board.
Seventy-eight programs responded to the survey (response rate = 56%). Sixty-seven percent of programs administered at least one progression assessment in 2019-2020. There was some variability in assessment practice, including professional year(s) administered, course(s) involved, and content. Approximately 75% of programs used assessments to ensure student competency in the programs\' learning outcomes and to identify individual student learning deficiencies. Diversity was seen in validity and reliability practices, and most programs used pre-determined cut scores without formal standard setting. Because of the pandemic, 75% of programs changed the assessment delivery mode and 20 programs planned to maintain at least one pandemic-related change in future iterations.
Most pharmacy programs utilize some type of progression assessment within their curriculum. While many schools administer progression assessments, there is little agreement on their purpose, development, and use. The pandemic changed the mode of delivery, which numerous programs will continue with in the future.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

求助全文

Standard Setting 关注

1 Using ACGME General Surgery Milestones to Define the Competent Foundational Surgical Resident.

2 Implementation of standard setting for high-stakes objective structured clinical examinations.

3 Standard setting anchor statements: a double cross-over trial of two different methods.

4 PRO-Angoff method for remote standard setting: establishing clinical thresholds for the upper digestive disease tool.

5 Angoff anchor statements: setting a flawed gold standard?

6 The Need for Standardization, Reliability and Validity in Fundamental Roots for a Successful Problem-Based Learning Program.

7 Competency-based Standard Setting for a High-stakes Objective Structured Clinical Examination (OSCE): Validity Evidence.

8 Towards a more nuanced conceptualisation of differential examiner stringency in OSCEs.

9 Automated Item Generation: impact of item variants on performance and standard setting.

10 Making progress: A national study on the development and use of progression assessments in United States pharmacy curricula.