使用概化性理论探索技术技能评估的可靠性和差异来源：系统回顾和荟萃分析。Use of Generalizability Theory for Exploring Reliability of and Sources of Variance in Assessment of Technical Skills: A Systematic Review and Meta-Analysis.-医云文献数字医云科研云海量医学决策数据服务

Abstract：

Competency-based education relies on the validity and reliability of assessment scores. Generalizability (G) theory is well suited to explore the reliability of assessment tools in medical education but has only been applied to a limited extent. This study aimed to systematically review the literature using G-theory to explore the reliability of structured assessment of medical and surgical technical skills and to assess the relative contributions of different factors to variance.
In June 2020, 11 databases, including PubMed, were searched from inception through May 31, 2020. Eligible studies included the use of G-theory to explore reliability in the context of assessment of medical and surgical technical skills. Descriptive information on study, assessment context, assessment protocol, participants being assessed, and G-analyses was extracted. Data were used to map G-theory and explore variance components analyses. A meta-analysis was conducted to synthesize the extracted data on the sources of variance and reliability.
Forty-four studies were included; of these, 39 had sufficient data for meta-analysis. The total pool included 35,284 unique assessments of 31,496 unique performances of 4,154 participants. Person variance had a pooled effect of 44.2% (95% confidence interval [CI], 36.8%-51.5%). Only assessment tool type (Objective Structured Assessment of Technical Skills-type vs task-based checklist-type) had a significant effect on person variance. The pooled reliability (G-coefficient) was 0.65 (95% CI, .59-.70). Most studies included decision studies (39, 88.6%) and generally seemed to have higher ratios of performances to assessors to achieve a sufficiently reliable assessment.
G-theory is increasingly being used to examine reliability of technical skills assessment in medical education, but more rigor in reporting is warranted. Contextual factors can potentially affect variance components and thereby reliability estimates and should be considered, especially in high-stakes assessment. Reliability analysis should be a best practice when developing assessment of technical skills.

摘要：

基于能力的教育依赖于评估分数的有效性和可靠性。概化性（G）理论非常适合探索医学教育中评估工具的可靠性，但应用范围有限。本研究旨在使用G理论系统地回顾文献，以探讨医疗和外科技术技能的结构化评估的可靠性，并评估不同因素对方差的相对贡献。
2020年6月，11个数据库，包括PubMed,从成立之初到2020年5月31日进行了搜索。合格的研究包括使用G理论在评估医学和外科技术技能的背景下探索可靠性。关于学习的描述性信息，评估背景，评估协议，被评估的参与者，并提取G分析。数据用于绘制G理论图并探索方差成分分析。进行了荟萃分析，以综合提取的有关方差和可靠性来源的数据。
纳入了44项研究；其中，39人有足够的数据进行荟萃分析。总库包括35,284项独特评估，包括4,154名参与者的31,496项独特表演。人的方差具有44.2%的合并效应(95%置信区间[CI]，36.8%-51.5%）。只有评估工具类型（客观结构化评估技术技能类型与基于任务的清单类型）对人的差异有显着影响。合并信度（G系数）为0.65（95%CI，.59-.70）。大多数研究包括决策研究（39，88.6％），并且通常似乎对评估者的表现比率更高，以实现足够可靠的评估。
G理论越来越多地用于检查医学教育中技术技能评估的可靠性，但更严格的报告是有必要的。情境因素可能会影响方差分量，从而影响可靠性估计，应予以考虑，特别是在高风险的评估中。可靠性分析应该是开发技术技能评估的最佳实践。