item response model

  • 文章类型: Journal Article
    OBJECTIVE: The aim of this study was to validate the Neuropathic Pain for Post-Surgical Patients (NeuPPS) scale against clinically verified neuropathic pain (NP) by quantitative sensory testing (QST) as well as evaluation of other psychometric properties. The NeuPPS is a validated 5-item scale designed to evaluate NP in surgical populations.
    METHODS: Data from 537 women aged >18 years scheduled for primary breast cancer surgery enrolled in a previous study for assessing risk factors for persistent pain after breast cancer treatment were used. Exclusion criteria were any other breast surgery or relevant comorbidity. A total of 448 eligible questionnaires were available at 6 months and 455 at 12 months. At 12 months, 290 patients completed a clinical examination and QST. NeuPPS and PainDETECT were analyzed against patients with and without clinically verified NP. NP was assessed using a standardized QST protocol including a clinical assessment. Furthermore, the NeuPPS and PainDETECT scores were psychometrically tested with an item response theory method, the Rasch analysis, to assess construct validity. Primary outcomes were the diagnostic accuracy measures for the NeuPPS, and secondary measures were psychometric analyses of the NeuPPS after 6 and 12 months. PainDETECT was also compared to clinically verified NP as well as NeuPPS comparing the stability of the estimates.
    RESULTS: Comparing the NeuPPS scores with verified NP using a receiver operating characteristic curve, the NeuPPS had an area under the curve of 0.80. Using a cutoff of 1, the NeuPPS had a sensitivity of 88% and a specificity of 59%, and using a cutoff of 3, the values were 35 and 96%, respectively. Analysis of the PainDETECT indicated that the used cutoffs may be inappropriate in a surgical population.
    CONCLUSIONS: The present study supports the validity of the NeuPPS as a screening tool for NP in a surgical population.






  • 文章类型: Journal Article
    How social networks influence human behavior has been an interesting topic in applied research. Existing methods often utilized scale-level behavioral data (e.g., total number of positive responses) to estimate the influence of a social network on human behavior. This study proposes a novel approach to studying social influence that utilizes item-level behavioral measures. Under the latent space modeling framework, we integrate the two latent spaces for respondents\' social network data and item-level behavior measures into a single space we call \'interaction map\'. The interaction map visualizes the association between the latent homophily among respondents and their item-level behaviors, revealing differential social influence effects across item-level behaviors. We also measure overall social influence by assessing the impact of the interaction map. We evaluate the properties of the proposed approach via extensive simulation studies and demonstrate the proposed approach with a real data in the context of studying how students\' friendship network influences their participation in school activities.






  • 文章类型: Journal Article
    Heywood cases are known from linear factor analysis literature as variables with communalities larger than 1.00, and in present day factor models, the problem also shows in negative residual variances. For binary data, factor models for ordinal data can be applied with either delta parameterization or theta parametrization. The former is more common than the latter and can yield Heywood cases when limited information estimation is used. The same problem shows up as non convergence cases in theta parameterized factor models and as extremely large discriminations in item response theory (IRT) models. In this study, we explain why the same problem appears in different forms depending on the method of analysis. We first discuss this issue using equations and then illustrate our conclusions using a small simulation study, where all three methods, delta and theta parameterized ordinal factor models (with estimation based on polychoric correlations and thresholds) and an IRT model (with full information estimation), are used to analyze the same datasets. The results generalize across WLS, WLSMV, and ULS estimators for the factor models for ordinal data. Finally, we analyze real data with the same three approaches. The results of the simulation study and the analysis of real data confirm the theoretical conclusions.






  • 文章类型: Journal Article
    ABSTRACTThe Musical Emotion Discrimination Task (MEDT) is a short, non-adaptive test of the ability to discriminate emotions in music. Test-takers hear two performances of the same melody, both played by the same performer but each trying to communicate a different basic emotion, and are asked to determine which one is \"happier\", for example. The goal of the current study was to construct a new version of the MEDT using a larger set of shorter, more diverse music clips and an adaptive framework to expand the ability range for which the test can deliver measurements. The first study analysed responses from a large sample of participants (N = 624) to determine how musical features contributed to item difficulty, which resulted in a quantitative model of musical emotion discrimination ability rooted in Item Response Theory (IRT). This model informed the construction of the adaptive MEDT. A second study contributed preliminary evidence for the validity and reliability of the adaptive MEDT, and demonstrated that the new version of the test is suitable for a wider range of abilities. This paper therefore presents the first adaptive musical emotion discrimination test, a new resource for investigating emotion processing which is freely available for research use.






  • 文章类型: Journal Article
    In educational large-scale assessment (LSA) studies such as PISA, item response theory (IRT) scaling models summarize students\' performance on cognitive test items across countries. This article investigates the impact of different factors in model specifications for the PISA 2018 mathematics study. The diverse options of the model specification also firm under the labels multiverse analysis or specification curve analysis in the social sciences. In this article, we investigate the following five factors of model specification in the PISA scaling model for obtaining the two country distribution parameters; country means and country standard deviations: (1) the choice of the functional form of the IRT model, (2) the treatment of differential item functioning at the country level, (3) the treatment of missing item responses, (4) the impact of item selection in the PISA test, and (5) the impact of test position effects. In our multiverse analysis, it turned out that model uncertainty had almost the same impact on variability in the country means as sampling errors due to the sampling of students. Model uncertainty had an even larger impact than standard errors for country standard deviations. Overall, each of the five specification factors in the multiverse analysis had at least a moderate effect on either country means or standard deviations. In the discussion section, we critically evaluate the current practice of model specification decisions in LSA studies. It is argued that we would either prefer reporting the variability in model uncertainty or choosing a particular model specification that might provide the strategy that is most valid. It is emphasized that model fit should not play a role in selecting a scaling strategy for LSA applications.






  • 文章类型: Journal Article
    In educational large-scale assessment studies such as PISA, item response theory (IRT) models are used to summarize students\' performance on cognitive test items across countries. In this article, the impact of the choice of the IRT model on the distribution parameters of countries (i.e., mean, standard deviation, percentiles) is investigated. Eleven different IRT models are compared using information criteria. Moreover, model uncertainty is quantified by estimating model error, which can be compared with the sampling error associated with the sampling of students. The PISA 2009 dataset for the cognitive domains mathematics, reading, and science is used as an example of the choice of the IRT model. It turned out that the three-parameter logistic IRT model with residual heterogeneity and a three-parameter IRT model with a quadratic effect of the ability θ provided the best model fit. Furthermore, model uncertainty was relatively small compared to sampling error regarding country means in most cases but was substantial for country standard deviations and percentiles. Consequently, it can be argued that model error should be included in the statistical inference of educational large-scale assessment studies.






  • 文章类型: Journal Article
    Integrative data analysis (IDA) involves obtaining multiple datasets, scaling the data to a common metric, and jointly analyzing the data. The first step in IDA is to scale the multisample item-level data to a common metric, which is often done with multiple group item response models (MGM). With invariance constraints tested and imposed, the estimated latent variable scores from the MGM serve as an observed variable in subsequent analyses. This approach was used with empirical multiple group data and different latent variable estimates were obtained for individuals with the same response pattern from different studies. A Monte Carlo simulation study was then conducted to compare the accuracy of latent variable estimates from the MGM, a single-group item response model, and an MGM where group differences are ignored. Results suggest that these alternative approaches led to consistent and equally accurate latent variable estimates. Implications for IDA are discussed.







  • 文章类型: Journal Article
    Missing item responses are prevalent in educational large-scale assessment studies such as the programme for international student assessment (PISA). The current operational practice scores missing item responses as wrong, but several psychometricians have advocated for a model-based treatment based on latent ignorability assumption. In this approach, item responses and response indicators are jointly modeled conditional on a latent ability and a latent response propensity variable. Alternatively, imputation-based approaches can be used. The latent ignorability assumption is weakened in the Mislevy-Wu model that characterizes a nonignorable missingness mechanism and allows the missingness of an item to depend on the item itself. The scoring of missing item responses as wrong and the latent ignorable model are submodels of the Mislevy-Wu model. In an illustrative simulation study, it is shown that the Mislevy-Wu model provides unbiased model parameters. Moreover, the simulation replicates the finding from various simulation studies from the literature that scoring missing item responses as wrong provides biased estimates if the latent ignorability assumption holds in the data-generating model. However, if missing item responses are generated such that they can only be generated from incorrect item responses, applying an item response model that relies on latent ignorability results in biased estimates. The Mislevy-Wu model guarantees unbiased parameter estimates if the more general Mislevy-Wu model holds in the data-generating model. In addition, this article uses the PISA 2018 mathematics dataset as a case study to investigate the consequences of different missing data treatments on country means and country standard deviations. Obtained country means and country standard deviations can substantially differ for the different scaling models. In contrast to previous statements in the literature, the scoring of missing item responses as incorrect provided a better model fit than a latent ignorable model for most countries. Furthermore, the dependence of the missingness of an item from the item itself after conditioning on the latent response propensity was much more pronounced for constructed-response items than for multiple-choice items. As a consequence, scaling models that presuppose latent ignorability should be refused from two perspectives. First, the Mislevy-Wu model is preferred over the latent ignorable model for reasons of model fit. Second, in the discussion section, we argue that model fit should only play a minor role in choosing psychometric models in large-scale assessment studies because validity aspects are most relevant. Missing data treatments that countries can simply manipulate (and, hence, their students) result in unfair country comparisons.






  • 文章类型: Journal Article
    Compositional items - a form of forced-choice items - require respondents to allocate a fixed total number of points to a set of statements. To describe the responses to these items, the Thurstonian item response theory (IRT) model was developed. Despite its prominence, the model requires that items composed of parts of statements result in a factor loading matrix with full rank. Without this requirement, the model cannot be identified, and the latent trait estimates would be seriously biased. Besides, the estimation of the Thurstonian IRT model often results in convergence problems. To address these issues, this study developed a new version of the Thurstonian IRT model for analyzing compositional items - the lognormal ipsative model (LIM) - that would be sufficient for tests using items with all statements positively phrased and with equal factor loadings. We developed an online value test following Schwartz\'s values theory using compositional items and collected response data from a sample size of N = 512 participants with ages from 13 to 51 years. The results showed that our LIM had an acceptable fit to the data, and that the reliabilities exceeded 0.85. A simulation study resulted in good parameter recovery, high convergence rate, and the sufficient precision of estimation in the various conditions of covariance matrices between traits, test lengths and sample sizes. Overall, our results indicate that the proposed model can overcome the problems of the Thurstonian IRT model when all statements are positively phrased and factor loadings are similar.






  • 文章类型: Journal Article
    The COVID-19 pandemic has spread widely around the world. Many mathematical models have been proposed to investigate the inflection point (IP) and the spread pattern of COVID-19. However, no researchers have applied social network analysis (SNA) to cluster their characteristics. We aimed to illustrate the use of SNA to identify the spread clusters of COVID-19. Cumulative numbers of infected cases (CNICs) in countries/regions were downloaded from GitHub. The CNIC patterns were extracted from SNA based on CNICs between countries/regions. The item response model (IRT) was applied to create a general predictive model for each country/region. The IP days were obtained from the IRT model. The location parameters in continents, China, and the United States were compared. The results showed that (1) three clusters (255, n = 51, 130, and 74 in patterns from Eastern Asia and Europe to America) were separated using SNA, (2) China had a shorter mean IP and smaller mean location parameter than other counterparts, and (3) an online dashboard was used to display the clusters along with IP days for each country/region. Spatiotemporal spread patterns can be clustered using SNA and correlation coefficients (CCs). A dashboard with spread clusters and IP days is recommended to epidemiologists and researchers and is not limited to the COVID-19 pandemic.






