count data

计数数据
  • 文章类型: Journal Article
    广义线性混合模型(GLMM)在处理单案例实验设计(SCED)中的计数数据方面具有巨大的潜力。然而,应用研究人员在自己的研究中使用这种先进的统计技术时,在做出各种统计决策时面临挑战。这项研究通过研究选择适当的分布来处理由于过度分散和/或零膨胀而导致的SCED中不同类型的计数数据,从而专注于一个关键问题。为了实现这一点,我提出了两个模型选择框架,一个基于计算信息标准(AIC和BIC),另一个基于利用多阶段模型选择程序。模拟了四种数据场景,包括泊松,负二项(NB),零膨胀泊松(ZIP),和零膨胀负二项式(ZINB)。同一组模型(即,Poisson,NB,ZIP,和ZINB)适用于每种情况。在模拟中,通过评估模型选择偏差及其对治疗效果估计和推论统计的准确性的影响,我评估了两个框架内的10种模型选择策略。根据仿真结果和前期工作,我提供了关于在不同情况下应采用哪些模型选择方法的建议。的影响,局限性,并对未来的研究方向进行了展望。
    Generalized linear mixed models (GLMMs) have great potential to deal with count data in single-case experimental designs (SCEDs). However, applied researchers have faced challenges in making various statistical decisions when using such advanced statistical techniques in their own research. This study focused on a critical issue by investigating the selection of an appropriate distribution to handle different types of count data in SCEDs due to overdispersion and/or zero-inflation. To achieve this, I proposed two model selection frameworks, one based on calculating information criteria (AIC and BIC) and another based on utilizing a multistage-model selection procedure. Four data scenarios were simulated including Poisson, negative binominal (NB), zero-inflated Poisson (ZIP), and zero-inflated negative binomial (ZINB). The same set of models (i.e., Poisson, NB, ZIP, and ZINB) were fitted for each scenario. In the simulation, I evaluated 10 model selection strategies within the two frameworks by assessing the model selection bias and its consequences on the accuracy of the treatment effect estimates and inferential statistics. Based on the simulation results and previous work, I provide recommendations regarding which model selection methods should be adopted in different scenarios. The implications, limitations, and future research directions are also discussed.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    有界计数响应数据在健康应用中自然出现。总的来说,众所周知的β-二项回归模型构成了分析这些数据的基础,特别是当我们有过度分散的数据。很少注意,然而,已经给出了关于极端观察和过度分散数据的可能性的文献。在这项工作中,我们提出了贝塔二项回归模型的扩展,命名为β-2二项回归模型,它提供了一种相当灵活的方法,用于在存在过度分散的情况下拟合具有广泛范围的有界计数响应数据集的回归模型,异常值,或者过度的极端观察。这种分布比β二项式模型具有更多的偏度和峰度,但保留了β二项式模型的相同均值和方差形式。得出了β-2二项分布的其他属性,包括其在参数空间极限上的行为。考虑采用惩罚最大似然方法来估计该模型的参数,并包括残差分析,以评估与模型假设的偏离以及检测异常观测值。模拟研究,考虑到对异常值的鲁棒性,证实了β-2二项回归模型是一个更好的稳健替代方案,与二项式和β二项回归模型相比。我们还发现,在我们预测小鼠肝癌发展和患者在医院度过的不适当天数的应用中,β-2-二项回归模型优于二项和β-二项回归模型。
    Bounded count response data arise naturally in health applications. In general, the well-known beta-binomial regression model form the basis for analyzing this data, specially when we have overdispersed data. Little attention, however, has been given to the literature on the possibility of having extreme observations and overdispersed data. We propose in this work an extension of the beta-binomial regression model, named the beta-2-binomial regression model, which provides a rather flexible approach for fitting a regression model with a wide spectrum of bounded count response data sets under the presence of overdispersion, outliers, or excess of extreme observations. This distribution possesses more skewness and kurtosis than the beta-binomial model but preserves the same mean and variance form of the beta-binomial model. Additional properties of the beta-2-binomial distribution are derived including its behavior on the limits of its parametric space. A penalized maximum likelihood approach is considered to estimate parameters of this model and a residual analysis is included to assess departures from model assumptions as well as to detect outlier observations. Simulation studies, considering the robustness to outliers, are presented confirming that the beta-2-binomial regression model is a better robust alternative, in comparison with the binomial and beta-binomial regression models. We also found that the beta-2-binomial regression model outperformed the binomial and beta-binomial regression models in our applications of predicting liver cancer development in mice and the number of inappropriate days a patient spent in a hospital.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    这项研究探讨了罪犯,受害者,和环境特征显着影响性凶杀受害者未被发现的天数。利用凶杀案调查追踪系统数据库中的269例样本,进行了深入分析,以揭示导致延迟发现受害者尸体的因素。方法论方法包括应用负二项回归分析,它允许检查计数数据,专门解决因变量中的过度分散和过量零-找到受害者之前的天数。调查结果表明,某些罪犯的特征,受害者特征,时空因素在定位凶杀受害者尸体所经历的时间滞后中起着关键作用。这些发现对凶杀案的调查工作具有至关重要的意义,提供有价值的见解,可以告知和提高未来调查程序和策略的效力和效率。
    This study explores the offender, victim, and environmental characteristics that significantly influence the number of days a sexual homicide victim remains undiscovered. Utilizing a sample of 269 cases from the Homicide Investigation Tracking System database an in-depth analysis was conducted to unveil the factors contributing to the delay in the discovery of victims\' bodies. The methodological approach involves applying a negative binomial regression analysis, which allows for the examination of count data, specifically addressing the over-dispersion and excess zeros in the dependent variable - the number of days until the victim is found. The findings reveal that certain offender characteristics, victim traits, and spatio-temporal factors play a pivotal role in the time lag experienced in locating the bodies of homicide victims. These findings have crucial implications for investigative efforts in homicide cases, offering valuable insights that can inform and enhance the efficacy and efficiency of future investigative procedures and strategies.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    口头阅读流利度(ORF)评估通常用于筛选有风险的读者并评估干预措施的有效性,作为基于课程的测量。类似于项目反应理论(IRT)的标准实践,在基于模型的ORF评分中,目前使用校准的通道参数估计,就好像它们是群体值一样.然而,下落不明的校准误差可能会使ORF分数估计值产生偏差,特别是,导致ORF分数的标准误差(SE)被低估。因此,我们考虑一种将校准误差纳入潜在变量分数的方法。我们进一步基于delta方法得出ORF分数的SE,以纳入校准不确定性。我们进行了模拟研究,以评估各种模拟条件下潜在变量得分和ORF得分的点估计和SE的恢复。结果表明,忽略校准误差会导致潜在变量得分SEs和ORF得分SEs被低估,特别是当校准样品较小时。
    Oral reading fluency (ORF) assessments are commonly used to screen at-risk readers and evaluate interventions\' effectiveness as curriculum-based measurements. Similar to the standard practice in item response theory (IRT), calibrated passage parameter estimates are currently used as if they were population values in model-based ORF scoring. However, calibration errors that are unaccounted for may bias ORF score estimates and, in particular, lead to underestimated standard errors (SEs) of ORF scores. Therefore, we consider an approach that incorporates the calibration errors in latent variable scores. We further derive the SEs of ORF scores based on the delta method to incorporate the calibration uncertainty. We conduct a simulation study to evaluate the recovery of point estimates and SEs of latent variable scores and ORF scores in various simulated conditions. Results suggest that ignoring calibration errors leads to underestimated latent variable score SEs and ORF score SEs, especially when the calibration sample is small.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    当前的泊松因子模型通常假设因子未知,它忽略了某些可观察协变量的解释潜力。本研究侧重于高维设置,其中计数响应变量和/或协变量的数量可以随着样本大小的增加而发散。提出了协变量增强的过分散泊松因子模型,以联合执行高维泊松因子分析,并估计过分散计数数据的大系数矩阵。提供了一组可识别性条件,从理论上保证了计算可识别性。通过在大系数矩阵上施加低秩约束,我们将响应变量和协变量的相互依赖性结合起来。为了解决非线性带来的计算挑战,两个高维潜在矩阵,和低秩约束,我们提出了一种结合Laplace和Taylor近似的变分估计方案。我们还开发了一个基于奇异值比率的标准来确定因子的数量和系数矩阵的秩。综合仿真研究表明,所提出的方法在估计精度和计算效率方面优于最先进的方法。通过对CITE-seq数据集的应用证明了我们方法的实用性。在R包COAP中可以灵活地实现我们提出的方法。
    The current Poisson factor models often assume that the factors are unknown, which overlooks the explanatory potential of certain observable covariates. This study focuses on high dimensional settings, where the number of the count response variables and/or covariates can diverge as the sample size increases. A covariate-augmented overdispersed Poisson factor model is proposed to jointly perform a high-dimensional Poisson factor analysis and estimate a large coefficient matrix for overdispersed count data. A group of identifiability conditions is provided to theoretically guarantee computational identifiability. We incorporate the interdependence of both response variables and covariates by imposing a low-rank constraint on the large coefficient matrix. To address the computation challenges posed by nonlinearity, two high-dimensional latent matrices, and the low-rank constraint, we propose a novel variational estimation scheme that combines Laplace and Taylor approximations. We also develop a criterion based on a singular value ratio to determine the number of factors and the rank of the coefficient matrix. Comprehensive simulation studies demonstrate that the proposed method outperforms the state-of-the-art methods in estimation accuracy and computational efficiency. The practical merit of our method is demonstrated by an application to the CITE-seq dataset. A flexible implementation of our proposed method is available in the R package COAP.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    本文评估了分析策略,这些策略尊重在经验应用中经常遇到的健康结果的有限计数性质。文献中缺少对分析和理解此类数据的策略的全面讨论和批评。本文的目标是提供一个深入的考虑突出的问题和战略进行这种分析,强调各种分析工具的优点和局限性,实证研究人员可以考虑。涵盖三个主要主题。首先,对有界计数健康结果的测量特性进行了回顾,并评估了它们的含义。第二,描述了当有界计数结果是评估中关注的对象时出现的问题。第三,推导了有界计数结果的(条件)概率和矩结构,并提出了相应的规范和估计策略,并特别注意部分效果。在健康研究中,可能会对这些数据提出许多问题,并且研究人员对分析方法的选择通常是重要的。
    This paper assesses analytical strategies that respect the bounded-count nature of health outcomes encountered often in empirical applications. Absent in the literature is a comprehensive discussion and critique of strategies for analyzing and understanding such data. The paper\'s goal is to provide an in-depth consideration of prominent issues arising in and strategies for undertaking such analyses, emphasizing the merits and limitations of various analytical tools empirical researchers may contemplate. Three main topics are covered. First, bounded-count health outcomes\' measurement properties are reviewed and their implications assessed. Second, issues arising when bounded-count outcomes are the objects of concern in evaluations are described. Third, the (conditional) probability and moment structures of bounded-count outcomes are derived and corresponding specification and estimation strategies presented with particular attention to partial effects. Many questions may be asked of such data in health research and a researcher\'s choice of analytical method is often consequential.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    尺度错误是一种有趣的现象,在这种现象中,孩子试图对一个微小的物体执行特定于物体的动作。存在几种观点来解释尺度误差的发展机制;然而,对于不同因素如何相互作用和影响规模误差,没有统一的说法,以前的研究中使用的统计方法不能充分捕获数据的结构。通过对九项不同研究(n=528)的汇总数据集进行二次分析,并使用更合适的统计方法,这项研究提供了一个更准确的描述尺度误差的发展。我们实现了零膨胀泊松(ZIP)回归,该回归可以直接处理具有零观测值的堆栈的计数数据,并将发展指数视为连续变量。结果表明,尺度误差的发展趋势是由倒U形曲线而不是简单的线性函数记录的,尽管非线性捕获了实验室和教室数据之间的比例误差的不同方面。我们还发现,对尺度错误任务的重复体验减少了尺度错误的数量,而女孩比男孩犯的比例错误更多。此外,模型比较方法揭示了谓词词汇量的大小(例如,形容词或动词),预测量表误差的发展变化优于名词词汇量,特别是在存在或不存在尺度误差方面。ZIP模型的应用使研究人员能够辨别不同因素如何影响规模误差产生,从而为揭开这些现象背后的机制提供了新的见解。本文的视频摘要可以在https://youtu查看。be/1v1U6CjDZ1Q研究亮点:我们通过将现有的比例误差数据聚合到零膨胀的泊松(ZIP)模型来拟合大型数据集。尺度误差沿不同的发育指数达到峰值,但是实验室和教室数据集之间的基本统计结构有所不同。对量表错误任务的重复体验和孩子的性别会影响每个会话产生的量表错误数量。谓词量(例如,形容词或动词)比名词词汇量大更好地预测量表错误的发展变化。
    Scale errors are intriguing phenomena in which a child tries to perform an object-specific action on a tiny object. Several viewpoints explaining the developmental mechanisms underlying scale errors exist; however, there is no unified account of how different factors interact and affect scale errors, and the statistical approaches used in the previous research do not adequately capture the structure of the data. By conducting a secondary analysis of aggregated datasets across nine different studies (n = 528) and using more appropriate statistical methods, this study provides a more accurate description of the development of scale errors. We implemented the zero-inflated Poisson (ZIP) regression that could directly handle the count data with a stack of zero observations and regarded developmental indices as continuous variables. The results suggested that the developmental trend of scale errors was well documented by an inverted U-shaped curve rather than a simple linear function, although nonlinearity captured different aspects of the scale errors between the laboratory and classroom data. We also found that repeated experiences with scale error tasks reduced the number of scale errors, whereas girls made more scale errors than boys. Furthermore, a model comparison approach revealed that predicate vocabulary size (e.g., adjectives or verbs), predicted developmental changes in scale errors better than noun vocabulary size, particularly in terms of the presence or absence of scale errors. The application of the ZIP model enables researchers to discern how different factors affect scale error production, thereby providing new insights into demystifying the mechanisms underlying these phenomena. A video abstract of this article can be viewed at https://youtu.be/1v1U6CjDZ1Q RESEARCH HIGHLIGHTS: We fit a large dataset by aggregating the existing scale error data to the zero-inflated Poisson (ZIP) model. Scale errors peaked along the different developmental indices, but the underlying statistical structure differed between the in-lab and classroom datasets. Repeated experiences with scale error tasks and the children\'s gender affected the number of scale errors produced per session. Predicate vocabulary size (e.g., adjectives or verbs) better predicts developmental changes in scale errors than noun vocabulary size.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    癫痫临床治疗中的一个主要问题是癫痫发作的不可预测性。然而,癫痫发作预测和风险评估的传统方法严重依赖原始发作频率,这是癫痫发作风险的随机测量。我们考虑贝叶斯非齐次隐马尔可夫模型,用于零膨胀发作计数数据的无监督聚类。所提出的模型允许在个体水平上对癫痫发作风险状态的序列进行概率估计。它还通过结合变量选择先验来识别驱动癫痫发作风险变化的临床协变量并适应高粒度数据,从而提供了优于现有方法的显着改进。为了推断,我们实现了一个有效的采样器,采用随机搜索和数据增强技术。我们在模拟癫痫发作计数数据上评估模型性能。然后,我们通过分析通过SeizureTracker™系统收集的133例Dravet综合征患者的每日癫痫发作计数数据来证明所提出的模型的临床实用性。患者报告的电子癫痫发作日记。我们报告了癫痫发作风险循环的动态,包括验证几种已知的药理学关系。我们还发现了描述Dravet综合征中风险状态的存在和波动性的新发现,这可能会直接告知咨询,以减少癫痫发作的不可预测性的患者这种毁灭性的原因。
    A major issue in the clinical management of epilepsy is the unpredictability of seizures. Yet, traditional approaches to seizure forecasting and risk assessment in epilepsy rely heavily on raw seizure frequencies, which are a stochastic measurement of seizure risk. We consider a Bayesian non-homogeneous hidden Markov model for unsupervised clustering of zero-inflated seizure count data. The proposed model allows for a probabilistic estimate of the sequence of seizure risk states at the individual level. It also offers significant improvement over prior approaches by incorporating a variable selection prior for the identification of clinical covariates that drive seizure risk changes and accommodating highly granular data. For inference, we implement an efficient sampler that employs stochastic search and data augmentation techniques. We evaluate model performance on simulated seizure count data. We then demonstrate the clinical utility of the proposed model by analyzing daily seizure count data from 133 patients with Dravet syndrome collected through the Seizure Tracker™ system, a patient-reported electronic seizure diary. We report on the dynamics of seizure risk cycling, including validation of several known pharmacologic relationships. We also uncover novel findings characterizing the presence and volatility of risk states in Dravet syndrome, which may directly inform counseling to reduce the unpredictability of seizures for patients with this devastating cause of epilepsy.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    在单例实验设计(SCED)中经常遇到计数结果。广义线性混合模型(GLMM)在处理过度分散的计数数据方面表现出了希望。然而,在SCED的基线阶段存在过多的零引入了一个更复杂的问题,称为零膨胀,经常被研究者忽视。本研究旨在在单案例研究中的多基线设计(MBD)中处理零膨胀和过度分散的计数数据。它检查了各种GLMM的性能(泊松,负二项式[NB],零膨胀泊松[ZIP],和零膨胀负二项[ZINB]模型),用于估计治疗效果和生成推论统计数据。此外,一个真实的例子被用来证明零膨胀和过度分散计数数据的分析。仿真结果表明,ZINB模型为治疗效果提供了准确的估计,而其他三个模型产生了有偏差的估计。当基线率较低时,从ZINB模型获得的推断统计是可靠的。然而,当数据过度分散但不是零膨胀时,ZINB和ZIP模型在准确估计治疗效果方面均表现不佳。这些发现有助于我们理解使用GLMM处理SCED中的零膨胀和过度分散的计数数据。的影响,局限性,并对未来的研究方向进行了展望。
    Count outcomes are frequently encountered in single-case experimental designs (SCEDs). Generalized linear mixed models (GLMMs) have shown promise in handling overdispersed count data. However, the presence of excessive zeros in the baseline phase of SCEDs introduces a more complex issue known as zero-inflation, often overlooked by researchers. This study aimed to deal with zero-inflated and overdispersed count data within a multiple-baseline design (MBD) in single-case studies. It examined the performance of various GLMMs (Poisson, negative binomial [NB], zero-inflated Poisson [ZIP], and zero-inflated negative binomial [ZINB] models) in estimating treatment effects and generating inferential statistics. Additionally, a real example was used to demonstrate the analysis of zero-inflated and overdispersed count data. The simulation results indicated that the ZINB model provided accurate estimates for treatment effects, while the other three models yielded biased estimates. The inferential statistics obtained from the ZINB model were reliable when the baseline rate was low. However, when the data were overdispersed but not zero-inflated, both the ZINB and ZIP models exhibited poor performance in accurately estimating treatment effects. These findings contribute to our understanding of using GLMMs to handle zero-inflated and overdispersed count data in SCEDs. The implications, limitations, and future research directions are also discussed.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    在心理学和教育学中,测试(例如,阅读测试)和自我报告(例如,临床问卷)生成计数,但是与二进制数据相比,相应的项目响应理论(IRT)方法尚不发达。最近的进展包括双参数康威-麦克斯韦-泊松模型(2PCMPM),推广Rasch的泊松计数模型,具有特定项目的难度,歧视,和色散参数。解释模型参数的差异可告知项目的构建和选择,但很少受到关注。我们介绍了两个基于2PCMPM的解释性计数IRT模型:项目协变量的分布回归检验模型,和(分类)人协变量的计数潜在回归模型。提供了估计方法,并在模拟中观察到令人满意的统计特性。两个示例说明了模型如何帮助理解测试和底层构造。
    In psychology and education, tests (e.g., reading tests) and self-reports (e.g., clinical questionnaires) generate counts, but corresponding Item Response Theory (IRT) methods are underdeveloped compared to binary data. Recent advances include the Two-Parameter Conway-Maxwell-Poisson model (2PCMPM), generalizing Rasch\'s Poisson Counts Model, with item-specific difficulty, discrimination, and dispersion parameters. Explaining differences in model parameters informs item construction and selection but has received little attention. We introduce two 2PCMPM-based explanatory count IRT models: The Distributional Regression Test Model for item covariates, and the Count Latent Regression Model for (categorical) person covariates. Estimation methods are provided and satisfactory statistical properties are observed in simulations. Two examples illustrate how the models help understand tests and underlying constructs.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

公众号