比较可靠性评估方法。Comparing Methods for Assessing Reliability.-医云文献数字医云科研云海量医学决策数据服务

Abstract：

The usual method for assessing the reliability of survey data has been to conduct reinterviews a short interval (such as one to two weeks) after an initial interview and to use these data to estimate relatively simple statistics, such as gross difference rates (GDRs). More sophisticated approaches have also been used to estimate reliability. These include estimates from multi-trait, multi-method experiments, models applied to longitudinal data, and latent class analyses. To our knowledge, no prior study has systematically compared these different methods for assessing reliability. The Population Assessment of Tobacco and Health Reliability and Validity (PATH-RV) Study, done on a national probability sample, assessed the reliability of answers to the Wave 4 questionnaire from the PATH Study. Respondents in the PATH-RV were interviewed twice about two weeks apart. We examined whether the classic survey approach yielded different conclusions from the more sophisticated methods. We also examined two ex ante methods for assessing problems with survey questions and item nonresponse rates and response times to see how strongly these related to the different reliability estimates. We found that kappa was highly correlated with both GDRs and over-time correlations, but the latter two statistics were less highly correlated, particularly for adult respondents; estimates from longitudinal analyses of the same items in the main PATH study were also highly correlated with the traditional reliability estimates. The latent class analysis results, based on fewer items, also showed a high level of agreement with the traditional measures. The other methods and indicators had at best weak relationships with the reliability estimates derived from the reinterview data. Although the Question Understanding Aid seems to tap a different factor from the other measures, for adult respondents, it did predict item nonresponse and response latencies and thus may be a useful adjunct to the traditional measures.

摘要：

评估调查数据可靠性的通常方法是在初次访谈后的短时间间隔(如一到两周)进行重新访谈，并使用这些数据来估计相对简单的统计数据，例如总差异率(GDR)。还使用了更复杂的方法来估计可靠性。这些包括来自多性状的估计，多方法实验，应用于纵向数据的模型，和潜在的阶级分析。据我们所知,以前的研究没有系统地比较这些评估可靠性的不同方法.人群烟草与健康信度和效度评估(PATH-RV)研究,在国家概率样本上做的，评估了PATH研究中第4波问卷答案的可靠性。PATH-RV的受访者相隔约两周接受了两次采访。我们研究了经典的调查方法是否与更复杂的方法得出了不同的结论。我们还研究了两种事前方法，用于评估调查问题以及项目无响应率和响应时间的问题，以了解它们与不同的可靠性估计之间的关系。我们发现kappa与GDR和随时间的相关性高度相关，但是后两个统计数据的相关性较低，特别是对于成人受访者;在主要PATH研究中,对相同项目进行纵向分析得出的估计值也与传统的可靠性估计值高度相关.潜在的类分析结果，基于更少的项目，与传统措施也表现出高度的一致性。其他方法和指标充其量与从重新访谈数据得出的可靠性估计的关系较弱。尽管问题理解援助似乎利用了与其他措施不同的因素，对于成人受访者，它确实预测了项目无反应和反应延迟，因此可能是传统措施的有用辅助手段。