Equating

等同
  • 文章类型: Journal Article
    目的:本研究旨在检查一般健康问卷(GHQ)-12和Kessler6(K6)是否评估了相同的基础结构,并开发了两个量表的评分转换表。
    方法:分析了在2021年同时完成GHQ-12和K6的4303人的随机样本。探索性双因素分析评估两个量表是否测量相同的结构,和Rasch分析评估了项目的严重性。为了可比性和分数转换,使用等效性对量表进行了转换。协议是用科恩的卡帕系数估计的,以及原始的正面和负面协议。
    结果:我们发现这两个量表在可以使其等效的程度上测量了相同的现象。给出了GHQ-12和K6之间的转换表。在GHQ-12双峰评分上应用常用的≥3的截止值,我们发现K6的最佳对应截止值为≥8。心理困扰的患病率为22%,GHQ-12%,K6为21%。
    结论:GHQ-12和K6在一个量表上测量了相同的结构,并且在另一个量表上发现了相应的截止分数。这对于一个量表取代另一个量表的纵向研究或时间序列是有价值的。
    OBJECTIVE: This study aimed to examine if the General Health Questionnaire (GHQ)-12 and Kessler 6 (K6) assess the same underlying construct and to develop a score conversion table for the two scales.
    METHODS: A random sample of 4303 people who completed both the GHQ-12 and K6 in 2021 were analyzed. Exploratory bifactor analysis evaluated if both scales measured the same construct, and Rasch analysis assessed item severities. The scales were transformed using Equipercentile equivalence for comparability and score conversion. Agreement was estimated with Cohen\'s Kappa coefficient, along with raw positive and negative agreement.
    RESULTS: We found that the two scales measure the same phenomenon to the extent that they can be made equivalent. Conversion tables between GHQ-12 and K6 are presented. Applying the commonly used cut-off of ≥3 on the GHQ-12 bi-modal scoring, we found that the best corresponding cut-off on the K6 would be ≥8. The prevalence of psychological distress was then 22% with GHQ-12% and 21% with K6.
    CONCLUSIONS: The GHQ-12 and K6 measure the same construct and corresponding cut-off scores on one scale were found for the other scale. This is valuable for longitudinal studies or time series where one scale has replaced the other scale.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:患者健康问卷(PHQ-9)和蒙哥马利-阿斯伯格抑郁量表(MADRS)是衡量老年人抑郁严重程度的常用量表。
    方法:我们利用来自优化老年人抗治疗抑郁症结果(OPTIMUM)临床试验的数据来产生与PHQ-9和MADRS总分相关的转换表。我们将样本分成训练(N=555)和验证样本(N=187)。对训练样本进行等值链接以产生PHQ-9和MADRS的转换表。我们将验证样本中的原始分数和估计分数与Bland-Altman分析进行了比较。我们使用原始分数和估计分数与卡方检验比较了抑郁严重程度。
    结果:Bland-Altman分析证实,至少95%样本的原始分数和估计分数之间的差异在平均差的1.96标准偏差内。卡方检验显示,使用原始和估计得分确定的每种抑郁症严重程度类别的参与者比例存在显着差异。
    结论:在比较个体抑郁严重程度时,应谨慎使用转换表。
    结论:我们的关于PHQ-9和MADRS评分的转换表可用于在仅使用这些量表之一的研究中使用汇总数据来比较治疗结果。
    BACKGROUND: The Patient Health Questionnaire (PHQ-9) and Montgomery-Asberg Depression Rating Scale (MADRS) are commonly used scales to measure depression severity in older adults.
    METHODS: We utilized data from the Optimizing Outcomes of Treatment-Resistant Depression in Older Adults (OPTIMUM) clinical trial to produce conversion tables relating PHQ-9 and MADRS total scores. We split the sample into training (N = 555) and validation samples (N = 187). Equipercentile linking was performed on the training sample to produce conversion tables for PHQ-9 and MADRS. We compared the original and estimated scores in the validation sample with Bland-Altman analysis. We compared the depression severity level using the original and estimated scores with Chi-square tests.
    RESULTS: The Bland-Altman analysis confirmed that differences between the original and estimated scores for at least 95 % of the sample fit within 1.96 standard deviations of the mean difference. Chi-square tests showed a significant difference in the proportion of participants at each depression severity category determined using the original and estimated scores.
    CONCLUSIONS: The conversion tables should be used with caution when comparing depression severity at the individual level.
    CONCLUSIONS: Our conversion tables relating PHQ-9 and MADRS scores can be used to compare treatment outcomes using aggregate data in studies that only used one of these scales.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    相等是一种统计程序,用于调整形式难度的差异,以便可以比较地使用和解释这些形式的分数。在实践中,然而,等同方法通常是在不考虑两种形式在难度上不同的程度的情况下实施的。该研究旨在研究在随机组(RG)和公共项目非等效组(CINEG)设计下,形式难度差异的大小对等同结果的影响。具体来说,本研究评估了在一组模拟条件下,包括不同水平的形式差异的六等方法的性能。结果显示,在RG设计下,当没有或有小的形式差异时,平均相等被证明是最准确的方法,而当难度差异中等或较大时,等百分位数是最准确的方法。在CINEG的设计下,当难度差异中等或较小时,TuckerLinear被认为是最准确的方法,并且链接的等百分位数或频率估计都是难度较大的首选。这项研究将为从业者提供基于研究证据的指导,以选择具有不同形式差异的等同方法。由于没有形式难度差异的条件也包括在内,当两种表格的难度相似时,这项研究将告知测试公司适当的等同方法。
    Equating is a statistical procedure used to adjust for the difference in form difficulty such that scores on those forms can be used and interpreted comparably. In practice, however, equating methods are often implemented without considering the extent to which two forms differ in difficulty. The study aims to examine the effect of the magnitude of a form difficulty difference on equating results under random group (RG) and common-item nonequivalent group (CINEG) designs. Specifically, this study evaluates the performance of six equating methods under a set of simulation conditions including varying levels of form difference. Results revealed that, under the RG design, mean equating was proven to be the most accurate method when there is no or small form difference, whereas equipercentile is the most accurate method when the difficulty difference is medium or large. Under the CINEG design, Tucker Linear was found to be the most accurate method when the difficulty difference is medium or small, and either chained equipercentile or frequency estimation is preferred with a large difficulty level. This study would provide practitioners with research evidence-based guidance in the choice of equating methods with varying levels of form difference. As the condition of no form difficulty difference is also included, this study would inform testing companies of appropriate equating methods when two forms are similar in difficulty level.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    目标:开发一种简单的,使用Rasch框架将36项简短形式健康调查(SF-36)和患者报告结果测量信息系统29项问卷(PROMIS-29)的等效领域等同或链接起来的实用方法。
    方法:2016年4月,PROMIS-29和SF-36由1,501名代表法国人口的个体完成。对于两个问卷共有的每个领域,对两个问卷中与该维度相关的项目拟合了部分信用模型。然后在相同的度量标准上校准这些项目,这使得一份问卷的分数能够与另一份问卷的分数相关联。
    结果:七个PROMIS-29量表中的六个和六个SF-36子量表中的五个(物理,疼痛,社会,活力,抑郁和焦虑领域)被等同或联系在一起。分数之间的对应表,95%的置信区间,为每个域建立。开发了一个免费的Stata宏观程序,以使等同或链接过程自动化。
    结论:这些结果应有助于在法国使用SF-36和PROMIS-29的研究中进行比较。所开发的等同或链接过程易于实施,可在其他国家和其他文书中使用。
    OBJECTIVE: To develop a simple, practical methodology to equate or link equivalent domains of the 36-item Short-Form Health Survey (SF-36) and the Patient-Reported Outcomes Measurement Information System 29-item questionnaire (PROMIS-29) using the Rasch framework.
    METHODS: In April 2016, the PROMIS-29 and SF-36 were completed by 1501 individuals selected to be representative of the French population. For each domain common to the two questionnaires, a Partial Credit Model was fitted to the items related to that dimension in the two questionnaires. These items were then calibrated on the same metric, which enabled the scores from one questionnaire to be associated with the scores from the other.
    RESULTS: Six of the seven PROMIS-29 scales and five of the six SF-36 subscales (physical, pain, social, vitality, depression and anxiety domains) were equated or linked. Correspondence tables between scores, with a 95% confidence interval, were established for each domain. A freely available Stata macro program was developed to automatize the equating or linking process.
    CONCLUSIONS: These results should facilitate comparisons across studies using the SF-36 and the PROMIS-29 in France. The equating or linking process developed is simple to implement and can be used in other countries and for other instruments.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    每分钟正确阅读的单词(WCPM)是口头阅读流利度(ORF)评估中的报告分数指标,它被广泛用作基于课程的测量的一部分,以筛选有风险的读者并监测接受干预措施的学生的进展。就像其他多种形式的评估一样,当从多个ORF通道中获得WCPM分数时,需要在学生之间和内部进行比较。本文提出了一种基于模型的对等WCPM分数的方法。进行了仿真研究,以评估基于模型的等式方法的性能以及一些具有外部锚测试设计的观察分数等式方法。
    Words read correctly per minute (WCPM) is the reporting score metric in oral reading fluency (ORF) assessments, which is popularly utilized as part of curriculum-based measurements to screen at-risk readers and to monitor progress of students who receive interventions. Just like other types of assessments with multiple forms, equating would be necessary when WCPM scores are obtained from multiple ORF passages to be compared both between and within students. This article proposes a model-based approach for equating WCPM scores. A simulation study was conducted to evaluate the performance of the model-based equating approach along with some observed-score equating methods with external anchor test design.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    这项模拟研究调查了当通过使用部分信用模型和普通人设计进行并行校准来等同于量表时,结构相似性的偏离以及量表难度和目标的差异会在多大程度上影响分数转换。讨论了模拟结果的实际意义,重点是与健康相关的研究环境中的规模相等。研究模拟了两个尺度的数据,改变项目的数量和样本量。量表之间的因子相关性用于操作构造相似性。通过增加对同等难度的偏离以及通过改变每个量表中项目和人员参数的分散来实现量表的目标。结果表明,尺度之间的相似度较低,变换精度较低。在相似程度相同的情况下,在项目参数范围包含人员参数范围的设置中,精度得到了提高。随着相似性的降低,分数转换精度更多受益于良好的针对性。难度转移到两个logit在某种程度上增加了估计偏差,但不影响转换精度。观察到的对难度变化的鲁棒性支持了应用真实分数相等方法而不是身份相等的优势,这被用作比较的幼稚基线方法。最后,更大的样本量并没有提高本研究的转换精度,更长的音阶仅略微改善了等同的质量。来自模拟研究的见解用于真实数据示例中。
    This simulation study investigated to what extent departures from construct similarity as well as differences in the difficulty and targeting of scales impact the score transformation when scales are equated by means of concurrent calibration using the partial credit model with a common person design. Practical implications of the simulation results are discussed with a focus on scale equating in health-related research settings. The study simulated data for two scales, varying the number of items and the sample sizes. The factor correlation between scales was used to operationalize construct similarity. Targeting of the scales was operationalized through increasing departure from equal difficulty and by varying the dispersion of the item and person parameters in each scale. The results show that low similarity between scales goes along with lower transformation precision. In cases with equal levels of similarity, precision improves in settings where the range of the item parameters is encompassing the person parameters range. With decreasing similarity, score transformation precision benefits more from good targeting. Difficulty shifts up to two logits somewhat increased the estimation bias but without affecting the transformation precision. The observed robustness against difficulty shifts supports the advantage of applying a true-score equating methods over identity equating, which was used as a naive baseline method for comparison. Finally, larger sample size did not improve the transformation precision in this study, longer scales improved only marginally the quality of the equating. The insights from the simulation study are used in a real-data example.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    在具有锚定测试(NEAT)设计的非等效组中不存在的响应部分可以管理到计划中的缺失场景。在小样本量的背景下,我们提出了一种基于机器学习(ML)的插补技术,称为链接随机森林(CRF),以在NEAT设计中执行等同任务。具体来说,基于不同的数据增强方法,提出了7种基于CRF的插补方法。通过仿真研究检查了所提出方法的等效性能。考虑了五个因素:(a)测试长度(20,30,40,50),(b)每个测试表格的样本大小(50对100),(C)共同/固定项目的比率(0.2对0.3),(d)采用两种形式的等效组和非等效组(无平均差与平均差0.5),和(E)三种不同类型的锚(随机,easy,andhard),导致96个条件。此外,五种传统的等同方法,(1)塔克法;(2)莱文观察分数法;(3)等百分位数方法;(4)圆弧法;(5)基于Rasch模型的并行校准,还考虑了,加上7种基于CRF的归位方法,本研究共12种方法。研究结果表明,受益于ML技术的优势,基于CRF的方法,结合了Tucker方法的等同结果,例如IMP_total_Tucker,IMP_pair_Tucker,和IMP_Tucker_cirlce方法,可以为相等任务中的“错误”产生更可靠和可信的估计,因此比其他小样本短长度测试中的分数更准确。
    The part of responses that is absent in the nonequivalent groups with anchor test (NEAT) design can be managed to a planned missing scenario. In the context of small sample sizes, we present a machine learning (ML)-based imputation technique called chaining random forests (CRF) to perform equating tasks within the NEAT design. Specifically, seven CRF-based imputation equating methods are proposed based on different data augmentation methods. The equating performance of the proposed methods is examined through a simulation study. Five factors are considered: (a) test length (20, 30, 40, 50), (b) sample size per test form (50 versus 100), (c) ratio of common/anchor items (0.2 versus 0.3), and (d) equivalent versus nonequivalent groups taking the two forms (no mean difference versus a mean difference of 0.5), and (e) three different types of anchors (random, easy, and hard), resulting in 96 conditions. In addition, five traditional equating methods, (1) Tucker method; (2) Levine observed score method; (3) equipercentile equating method; (4) circle-arc method; and (5) concurrent calibration based on Rasch model, were also considered, plus seven CRF-based imputation equating methods for a total of 12 methods in this study. The findings suggest that benefiting from the advantages of ML techniques, CRF-based methods that incorporate the equating result of the Tucker method, such as IMP_total_Tucker, IMP_pair_Tucker, and IMP_Tucker_cirlce methods, can yield more robust and trustable estimates for the \"missingness\" in an equating task and therefore result in more accurate equated scores than other counterparts in short-length tests with small samples.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:本研究旨在根据意大利非痴呆ALS患者队列中的ALS认知行为屏幕(ALS-CBS™),得出爱丁堡认知和行为ALS屏幕(ECAS)上的评分。
    方法:回顾性检索293例无额颞叶痴呆的ALS患者的ALS-CBS™和ECAS评分。ALS-CBS™对ECAS的并发有效性通过人口统计学的协方差测试,疾病持续时间和严重程度,C9orf72六核苷酸重复扩增的存在和行为特征。采用线性平滑等百分位数相等(LSEE)模型来推导ALS-CBS™至ECAS交叉行走。基于LSEE的估计中的差距是通过基于线性回归的等式方法来管理的。通过对从属样本的双侧测试(TOST)程序测试了经验和得出的ECAS得分之间的等效性。
    结果:ALS-CBS™预测了ECAS(β=0.75),占其方差的绝大部分(R2=0.71的60%)。始终如一,一个强大的,检测到ALS-CBS™和ECAS评分之间的一对一线性关联(r=0.84;R2=0.73).LSEE能够估计ALS-CBS™全系列的转化率,除了原始分数等于1和6-为他们推导了一个基于线性等式的方程。经验ECAS分数与两种方法得出的分数相等。
    结论:意大利从业人员和研究人员在此获得了有效的,在非痴呆的ALS患者中,基于ALS-CBS™评分进行简单的交叉行走以估计ECAS.本文提供的转换将有助于避免研究中测试采用的横截面/纵向不一致,可能还有临床,设置。
    BACKGROUND: The present study aimed at deriving equating norms to estimate scores on the Edinburgh Cognitive and Behavioural ALS Screen (ECAS) based on those on the ALS Cognitive Behavioral Screen (ALS-CBS™) in an Italian cohort of non-demented ALS patients.
    METHODS: ALS-CBS™ and ECAS scores of 293 ALS patients without frontotemporal dementia were retrospectively retrieved. Concurrent validity of the ALS-CBS™ towards the ECAS was tested by covarying for demographics, disease duration and severity, presence of C9orf72 hexanucleotide repeat expansion and behavioural features. A linear-smoothing equipercentile equating (LSEE) model was employed to derive ALS-CBS™-to-ECAS cross-walks. Gaps in LSEE-based estimation were managed via a linear regression-based equating approach. Equivalence between empirical and derived ECAS scores was tested via a two-one-sided test (TOST) procedure for the dependent sample.
    RESULTS: The ALS-CBS™ predicted the ECAS (β = 0.75), accounting for the vast majority of its variance (60% out of an R2 = 0.71). Consistently, a strong, one-to-one linear association between ALS-CBS™ and ECAS scores was detected (r = 0.84; R2 = 0.73). The LSEE was able to estimate conversions for the full range of the ALS-CBS™, except for raw scores equal to 1 and 6 - for whom a linear equating-based equation was derived. Empirical ECAS scores were equivalent to those derived with both methods.
    CONCLUSIONS: Italian practitioners and researchers have been herewith provided with valid, straightforward cross-walks to estimate the ECAS based on ALS-CBS™ scores in non-demented ALS patients. Conversions herewith provided will help avoid cross-sectional/longitudinal inconsistencies in test adoption within research, and possibly clinical, settings.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    测试等同是一种统计程序,以确保来自不同测试形式的分数可以互换使用。有几种方法可用于执行等同,其中一些基于经典测试理论(CTT)框架,另一些基于项目响应理论(IRT)框架。本文比较了源自三个不同框架的等号转换,即IRT观测分数相等(IRTOSE),内核均衡(KE),和IRT内核均衡(IRTKE)。比较是在不同的数据生成场景下进行的,其中包括开发一种新颖的数据生成程序,该程序允许在不依赖IRT参数的情况下模拟测试数据,同时仍可以控制某些测试得分属性,例如分布偏度和项目难度。我们的结果表明,即使数据不是从IRT过程中生成的,IRT方法也倾向于提供比KE更好的结果。如果可以找到适当的预平滑解决方案,KE可能能够提供令人满意的结果,同时也比IRT方法快得多。对于日常应用,我们建议观察结果对等同方法的敏感性,考虑良好的模型拟合和满足框架假设的重要性。
    Test equating is a statistical procedure to ensure that scores from different test forms can be used interchangeably. There are several methodologies available to perform equating, some of which are based on the Classical Test Theory (CTT) framework and others are based on the Item Response Theory (IRT) framework. This article compares equating transformations originated from three different frameworks, namely IRT Observed-Score Equating (IRTOSE), Kernel Equating (KE), and IRT Kernel Equating (IRTKE). The comparisons were made under different data-generating scenarios, which include the development of a novel data-generation procedure that allows the simulation of test data without relying on IRT parameters while still providing control over some test score properties such as distribution skewness and item difficulty. Our results suggest that IRT methods tend to provide better results than KE even when the data are not generated from IRT processes. KE might be able to provide satisfactory results if a proper pre-smoothing solution can be found, while also being much faster than IRT methods. For daily applications, we recommend observing the sensibility of the results to the equating method, minding the importance of good model fit and meeting the assumptions of the framework.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Systematic Review
    背景:使用简单的工具进行早期和准确的认知变化检测对于适当转诊到更详细的神经认知评估和实施治疗策略至关重要。迷你精神状态检查(MMSE)和蒙特利尔认知评估(MoCA)是两种常用的认知筛查心理测验。两种测试都有不同的优点和缺点。因此,临床医生对测试选择的偏好可能有所不同。这项回顾性观察性队列研究的目的是确定MMSE和MoCA的相应评分。
    方法:我们检查了803名德语记忆诊所门诊患者的认知筛查测试之间的关系,包括广泛的神经认知障碍。我们使用带对数线性平滑的等百分位数方法生成了一个转换表。此外,我们对现有的MMSE-MoCA转换进行了系统回顾,以创建一个表格,允许使用加权平均法将MoCA评分转换为MMSE评分,反之亦然.
    结果:MemoryClinic样本显示,与从MoCA到MMSE的转换相比,MMSE到MoCA的预测总体上不那么准确。经过彻底的文献检索后,纳入的19项研究表明,MoCA得分始终低于MMSE得分。19项转换研究中有11项涉及MoCA向MMSE的转换,而两项研究将MMSE转换为MoCA评分。另外六项研究应用了双向转换。我们提供了一个易于使用的表格,涵盖了整个分数范围,并考虑了所有现有的转换公式。
    结论:综合MMSE-MoCA转换表可以直接比较神经认知障碍患者在筛查检查和病程中的认知测试得分。
    Early and accurate detection of cognitive changes using simple tools is essential for an appropriate referral to a more detailed neurocognitive assessment and for the implementation of therapeutic strategies. The Mini-Mental Status Examination (MMSE) and the Montreal Cognitive Assessment (MoCA) are two commonly used psychometric tests for cognitive screening. Both tests have different strengths and weaknesses. Preferences regarding test selection may therefore differ among clinicians. The aim of this retrospective observational cohort study was to define corresponding scores for the MMSE and the MoCA.
    We examined the relationship between the cognitive screening tests in 803 German-speaking Memory Clinic outpatients, encompassing a wide range of neurocognitive disorders. We produced a conversion table using the equipercentile equating method with log-linear smoothing. In addition, we conducted a systematic review of existing MMSE-MoCA conversions to create a table allowing for the conversion of MoCA scores into MMSE scores and vice versa using the weighted mean method.
    The Memory Clinic sample showed that the prediction of MMSE to MoCA was overall less accurate compared to the conversion from MoCA to MMSE. The 19 studies included after thorough literature search showed that MoCA scores were consistently lower than MMSE scores. Eleven of 19 conversion studies had addressed the conversion of the MoCA to the MMSE, while two studies converted MMSE to MoCA scores. Another six studies applied bi-directional conversions. We provide an easy-to-use table covering the entire range of scores and taking into account all currently existing conversion formulas.
    The comprehensive MMSE-MoCA conversion table enables a direct comparison of cognitive test scores at screening examinations and over the course of disease in patients with neurocognitive disorders.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

公众号