    This paper assesses analytical strategies that respect the bounded-count nature of health outcomes encountered often in empirical applications. Absent in the literature is a comprehensive discussion and critique of strategies for analyzing and understanding such data. The paper\'s goal is to provide an in-depth consideration of prominent issues arising in and strategies for undertaking such analyses, emphasizing the merits and limitations of various analytical tools empirical researchers may contemplate. Three main topics are covered. First, bounded-count health outcomes\' measurement properties are reviewed and their implications assessed. Second, issues arising when bounded-count outcomes are the objects of concern in evaluations are described. Third, the (conditional) probability and moment structures of bounded-count outcomes are derived and corresponding specification and estimation strategies presented with particular attention to partial effects. Many questions may be asked of such data in health research and a researcher\'s choice of analytical method is often consequential.






    A major issue in the clinical management of epilepsy is the unpredictability of seizures. Yet, traditional approaches to seizure forecasting and risk assessment in epilepsy rely heavily on raw seizure frequencies, which are a stochastic measurement of seizure risk. We consider a Bayesian non-homogeneous hidden Markov model for unsupervised clustering of zero-inflated seizure count data. The proposed model allows for a probabilistic estimate of the sequence of seizure risk states at the individual level. It also offers significant improvement over prior approaches by incorporating a variable selection prior for the identification of clinical covariates that drive seizure risk changes and accommodating highly granular data. For inference, we implement an efficient sampler that employs stochastic search and data augmentation techniques. We evaluate model performance on simulated seizure count data. We then demonstrate the clinical utility of the proposed model by analyzing daily seizure count data from 133 patients with Dravet syndrome collected through the Seizure Tracker™ system, a patient-reported electronic seizure diary. We report on the dynamics of seizure risk cycling, including validation of several known pharmacologic relationships. We also uncover novel findings characterizing the presence and volatility of risk states in Dravet syndrome, which may directly inform counseling to reduce the unpredictability of seizures for patients with this devastating cause of epilepsy.






    Zero-inflated Poisson (ZIP) model is widely used for counting data with excessive zeroes. The multicollinearity is the common factor in the explanatory variables of the count data. In this context, typically, maximum likelihood estimation (MLE) generates unsatisfactory results due to inflation of mean square error (MSE). In the solution of this problem usually, ridge parameters are used. In this study, we proposed a new modified zero-inflated Poisson ridge regression model to reduce the problem of multicollinearity. We experimented within the context of a specified simulation strategy and recorded the behavior of proposed estimators. We also apply our proposed estimator to the real-life data set and explore how our proposed estimators perform well in the presence of multicollinearity with the help of ZIP model for count data.






    The \"meningitis belt\" is a region in sub-Saharan Africa where annual outbreaks of meningitis occur, with epidemics observed cyclically. While we know that meningitis is heavily dependent on seasonal trends, the exact pathways for contracting the disease are not fully understood and warrant further investigation. Most previous approaches have used large sample inference to assess impacts of weather on meningitis rates. However, in the case of rare events, the validity of such assumptions is uncertain. This work examines the meningitis trends in the context of rare events, with the specific objective of quantifying the underlying seasonal patterns in meningitis rates. We compare three main classes of models: the Poisson generalized linear model, the Poisson generalized additive model, and a Bayesian hazard model extended to accommodate count data and a changing at-risk population. We compare the accuracy and robustness of the models through the bias, RMSE, and standard deviation of the estimators, and also provide a detailed case study of meningitis patterns for data collected in Navrongo, Ghana.






    In this paper, we present an efficient statistical method (denoted as \'Adaptive Resources Allocation CUSUM\') to robustly and efficiently detect the hotspot with limited sampling resources. Our main idea is to combine the multi-arm bandit (MAB) and change-point detection methods to balance the exploration and exploitation of resource allocation for hotspot detection. Further, a Bayesian weighted update is used to update the posterior distribution of the infection rate. Then, the upper confidence bound (UCB) is used for resource allocation and planning. Finally, CUSUM monitoring statistics to detect the change point as well as the change location. For performance evaluation, we compare the performance of the proposed method with several benchmark methods in the literature and showed the proposed algorithm is able to achieve a lower detection delay and higher detection precision. Finally, this method is applied to hotspot detection in a real case study of county-level daily positive COVID-19 cases in Washington State WA) and demonstrates the effectiveness with very limited distributed samples.






    Fractures are rare events and can occur because of a fall. Fracture counts are distinct from other count data in that these data are positively skewed, inflated by excess zero counts, and events can recur over time. Analytical methods used to assess fracture data and account for these characteristics are limited in the literature.
    Commonly used models for count data include Poisson regression, negative binomial regression, hurdle regression, and zero-inflated regression models. In this paper, we compare four alternative statistical models to fit fracture counts using data from a large UK based clinical trial evaluating the clinical and cost-effectiveness of alternative falls prevention interventions in older people (Prevention of Falls Injury Trial; PreFIT).
    The values of Akaike information criterion and Bayesian information criterion, the goodness-of-fit statistics, were the lowest for negative binomial model. The likelihood ratio test of no dispersion in the data showed strong evidence of dispersion (chi-square = 225.68, p-value < 0.001). This indicates that the negative binomial model fits the data better compared to the Poisson regression model. We also compared the standard negative binomial regression and mixed effects negative binomial models. The LR test showed no gain in fitting the data using mixed effects negative binomial model (chi-square = 1.67, p-value = 0.098) compared to standard negative binomial model.
    The negative binomial regression model was the most appropriate and optimal fit model for fracture count analyses.
    The PreFIT trial was registered as ISRCTN71002650.






    BACKGROUND: Two characteristics of commonly used outcomes in medical research are zero inflation and non-negative integers; examples include the number of hospital admissions or emergency department visits, where the majority of patients will have zero counts. Zero-inflated regression models were devised to analyze this type of data. However, the performance of zero-inflated regression models or the properties of data best suited for these analyses have not been thoroughly investigated.
    METHODS: We conducted a simulation study to evaluate the performance of two generalized linear models, negative binomial and zero-inflated negative binomial, for analyzing zero-inflated count data. Simulation scenarios assumed a randomized controlled trial design and varied the true underlying distribution, sample size, and rate of zero inflation. We compared the models in terms of bias, mean squared error, and coverage. Additionally, we used logistic regression to determine which data properties are most important for predicting the best-fitting model.
    RESULTS: We first found that, regardless of the rate of zero inflation, there was little difference between the conventional negative binomial and its zero-inflated counterpart in terms of bias of the marginal treatment group coefficient. Second, even when the outcome was simulated from a zero-inflated distribution, a negative binomial model was favored above its ZI counterpart in terms of the Akaike Information Criterion. Third, the mean and skewness of the non-zero part of the data were stronger predictors of model preference than the percentage of zero counts. These results were not affected by the sample size, which ranged from 60 to 800.
    CONCLUSIONS: We recommend that the rate of zero inflation and overdispersion in the outcome should not be the sole and main justification for choosing zero-inflated regression models. Investigators should also consider other data characteristics when choosing a model for count data. In addition, if the performance of the NB and ZINB regression models is reasonably comparable even with ZI outcomes, we advocate the use of the NB regression model due to its clear and straightforward interpretation of the results.






    Estimating the rank of a corrupted data matrix is an important task in data analysis, most notably for choosing the number of components in PCA. Significant progress on this task was achieved using random matrix theory by characterizing the spectral properties of large noise matrices. However, utilizing such tools is not straightforward when the data matrix consists of count random variables, e.g., Poisson, in which case the noise can be heteroskedastic with an unknown variance in each entry. In this work, we focus on a Poisson random matrix with independent entries and propose a simple procedure, termed biwhitening, for estimating the rank of the underlying signal matrix (i.e., the Poisson parameter matrix) without any prior knowledge. Our approach is based on the key observation that one can scale the rows and columns of the data matrix simultaneously so that the spectrum of the corresponding noise agrees with the standard Marchenko-Pastur (MP) law, justifying the use of the MP upper edge as a threshold for rank selection. Importantly, the required scaling factors can be estimated directly from the observations by solving a matrix scaling problem via the Sinkhorn-Knopp algorithm. Aside from the Poisson, our approach is extended to families of distributions that satisfy a quadratic relation between the mean and the variance, such as the generalized Poisson, binomial, negative binomial, gamma, and many others. This quadratic relation can also account for missing entries in the data. We conduct numerical experiments that corroborate our theoretical findings, and showcase the advantage of our approach for rank estimation in challenging regimes. Furthermore, we demonstrate the favorable performance of our approach on several real datasets of single-cell RNA sequencing (scRNA-seq), High-Throughput Chromosome Conformation Capture (Hi-C), and document topic modeling.






    Ideal number of children (INC) is the number of children that a woman or man would have if they could go back to the time when they did not have any children and could choose accurately the number of children to have in their total life. Despite numerous studies on the prevalence and associated factors of the ideal number of children, there is a lack of studies that incorporated spatial and multilevel analysis. Thus, this study was aimed at the spatial and multilevel analysis of an ideal number of children and associated factors.
    The study design was a cross-sectional study in which the data was obtained from Ethiopian Demographic and Health Survey (EDHS) in 2016. About 13,961 women ages 15-49 who fulfill the inclusion criterion were considered. A negative binomial regression model that incorporates spatial and multilevel analysis was employed.
    About 33 and 12.8% of the women had four and six ideal numbers of children respectively. The highest INC per woman was recorded in Oromia region 5055 (36.1%) and the lowest in Harare 35(0.2%). The INC per woman is high in rural 10,726 (76.6%) areas as compared to urban areas 3277(23.4%). The ideal number of children was spatially clustered (Global Moran\'s I = 0.1439, p < .00043). Significant hotspot clusters were found in the Somali region such as in Afder, Shabelle, Korahe, and Doolo zone.
    The spatial analysis revealed a significant clustering of the ideal number of children in the Ethiopia zone. Specifically, higher INC was observed in the Somali region, specifically in the Afder, Shabelle, Korahe, and Doolo zones. Among the various factors considered, women\'s age, region, place of residence, women\'s education level, contraception use, religion, marital status, family size, and age at first birth year were identified as significant predictors of the ideal number of children. These findings indicate that these factors play a crucial role in shaping reproductive preferences and decisions among women in the study population. Based on these findings, responsible bodies should prioritize targeted interventions and policies in high-risk regions to address women\'s specific reproductive needs.






    We develop a generalized linear mixed model (GLMM) for bivariate count responses for statistically analyzing dragonfly population data from the Northern Netherlands. The populations of the threatened dragonfly species Aeshna viridis were counted in the years 2015-2018 at 17 different locations (ponds and ditches). Two different widely applied population size measures were used to quantify the population sizes, namely the number of found exoskeletons (\'exuviae\') and the number of spotted egg-laying females were counted. Since both measures (responses) led to many zero counts but also feature very large counts, our GLMM model builds on a zero-inflated bivariate geometric (ZIBGe) distribution, for which we show that it can be easily parameterized in terms of a correlation parameter and its two marginal medians. We model the medians with linear combinations of fixed (environmental covariates) and random (location-specific intercepts) effects. Modeling the medians yields a decreased sensitivity to overly large counts; in particular, in light of growing marginal zero inflation rates. Because of the relatively small sample size (n = 114) we follow a Bayesian modeling approach and use Metropolis-Hastings Markov Chain Monte Carlo (MCMC) simulations for generating posterior samples.





