Expectation-maximization algorithm

  • 文章类型: Journal Article
    The mixture of probabilistic regression models is one of the most common techniques to incorporate the information of covariates into learning of the population heterogeneity. Despite its flexibility, unreliable estimates can occur due to multicollinearity among covariates. In this paper, we develop Liu-type shrinkage methods through an unsupervised learning approach to estimate the model coefficients in the presence of multicollinearity. We evaluate the performance of our proposed methods via classification and stochastic versions of the expectation-maximization algorithm. We show using numerical simulations that the proposed methods outperform their Ridge and maximum likelihood counterparts. Finally, we apply our methods to analyze the bone mineral data of women aged 50 and older.






  • 文章类型: Journal Article
    Interval-censored failure time data frequently arise in various scientific studies where each subject experiences periodical examinations for the occurrence of the failure event of interest, and the failure time is only known to lie in a specific time interval. In addition, collected data may include multiple observed variables with a certain degree of correlation, leading to severe multicollinearity issues. This work proposes a factor-augmented transformation model to analyze interval-censored failure time data while reducing model dimensionality and avoiding multicollinearity elicited by multiple correlated covariates. We provide a joint modeling framework by comprising a factor analysis model to group multiple observed variables into a few latent factors and a class of semiparametric transformation models with the augmented factors to examine their and other covariate effects on the failure event. Furthermore, we propose a nonparametric maximum likelihood estimation approach and develop a computationally stable and reliable expectation-maximization algorithm for its implementation. We establish the asymptotic properties of the proposed estimators and conduct simulation studies to assess the empirical performance of the proposed method. An application to the Alzheimer\'s Disease Neuroimaging Initiative (ADNI) study is provided. An R package ICTransCFA is also available for practitioners. Data used in preparation of this article were obtained from the ADNI database.






  • 文章类型: Journal Article
    Spatiotemporal information on individual trajectories in urban rail transit is important for operational strategy adjustment, personalized recommendation, and emergency command decision-making. However, due to the lack of journey observations, it is difficult to accurately infer unknown information from trajectories based only on AFC and AVL data. To address the problem, this paper proposes a spatiotemporal probabilistic graphical model based on adaptive expectation maximization attention (STPGM-AEMA) to achieve the reconstruction of individual trajectories. The approach consists of three steps: first, the potential train alternative set and the egress time alternative set of individuals are obtained through data mining and combinatorial enumeration. Then, global and local potential variables are introduced to construct a spatiotemporal probabilistic graphical model, provide the inference process for unknown events, and state information about individual trajectories. Further, considering the effect of missing data, an attention mechanism-enhanced expectation-maximization algorithm is proposed to achieve maximum likelihood estimation of individual trajectories. Finally, typical datasets of origin-destination pairs and actual individual trajectory tracking data are used to validate the effectiveness of the proposed method. The results show that the STPGM-AEMA method is more than 95% accurate in recovering missing information in the observed data, which is at least 15% more accurate than the traditional methods (i.e., PTAM-MLE and MPTAM-EM).






  • 文章类型: Journal Article
    Multidimensional item response theory (MIRT) models have generated increasing interest in the psychometrics literature. Efficient approaches for estimating MIRT models with dichotomous responses have been developed, but constructing an equally efficient and robust algorithm for polytomous models has received limited attention. To address this gap, this paper presents a novel Gaussian variational estimation algorithm for the multidimensional generalized partial credit model. The proposed algorithm demonstrates both fast and accurate performance, as illustrated through a series of simulation studies and two real data analyses.






  • 文章类型: Journal Article
    An adaptive harmonic separation (HS) technique is proposed to overcome the limitations in conventional filtering techniques for ultrasound (US) tissue harmonic imaging (THI).
    Based on expectation-maximization source separation, the proposed HS technique adaptively models the depth-varying fundamental and harmonic components in the frequency domain and separates the two by applying their calculated posterior probabilities. Phantom experiments with a Tx center frequency of 2 MHz are conducted to evaluate the proposed HS-based US THI schemes.
    The phantom images show that the proposed single-pulse THI scheme utilizing the HS technique provides not only an average improvement of 19.2% in axial resolution compared to the conventional bandpass filtering scheme but also similar image quality to that of the conventional pulse-inversion (PI) scheme which requires two Tx/Rx sequences for each scan line. Furthermore, when combined with the PI technique, the HS technique provides a uniform axial resolution over the entire 170 mm imaging depth with an average improvement of 17.1% compared to the conventional PI scheme.
    These results show that the proposed adaptive HS technique is capable of improving both the frame rate and the image quality of US THI.






  • 文章类型: Journal Article
    In cancer studies, it is commonplace that a fraction of patients participating in the study are cured, such that not all of them will experience a recurrence, or death due to cancer. Also, it is plausible that some covariates, such as the treatment assigned to the patients or demographic characteristics, could affect both the patients\' survival rates and cure/incidence rates. A common approach to accommodate these features in survival analysis is to consider a mixture cure survival model with the incidence rate modeled by a logistic regression model and latency part modeled by the Cox proportional hazards model. These modeling assumptions, though typical, restrict the structure of covariate effects on both the incidence and latency components. As a plausible recourse to attain flexibility, we study a class of semiparametric mixture cure models in this article, which incorporates two single-index functions for modeling the two regression components. A hybrid nonparametric maximum likelihood estimation method is proposed, where the cumulative baseline hazard function for uncured subjects is estimated nonparametrically, and the two single-index functions are estimated via Bernstein polynomials. Parameter estimation is carried out via a curated expectation-maximization algorithm. We also conducted a large-scale simulation study to assess the finite-sample performance of the estimator. The proposed methodology is illustrated via application to two cancer datasets.






  • 文章类型: Journal Article
    When the binary response variable contains an excess of zero counts, the data are imbalanced. Imbalanced data cause trouble for binary classification. To simplify the numerical computation to obtain the maximum likelihood estimators of the zero-inflated Bernoulli (ZIBer) model parameters with imbalanced data, an expectation-maximization (EM) algorithm is proposed to derive the maximum likelihood estimates of the model parameters. The logistic regression model links the Bernoulli probabilities with the covariates in the ZIBer model, and the prediction performance among the ZIBer model, LightGBM, and artificial neural network (ANN) procedures is compared by Monte Carlo simulation. The results show that no method can dominate the other methods regarding predictive performance under the imbalanced data. The LightGBM and ZIBer models are more competitive than the ANN model for zero-inflated-imbalanced data sets.






  • 文章类型: Journal Article
    Multivariate interval-censored data arise when there are multiple types of events or clusters of study subjects, such that the event times are potentially correlated and when each event is only known to occur over a particular time interval. We formulate the effects of potentially time-varying covariates on the multivariate event times through marginal proportional hazards models while leaving the dependence structures of the related event times unspecified. We construct the nonparametric pseudolikelihood under the working assumption that all event times are independent, and we provide a simple and stable EM-type algorithm. The resulting nonparametric maximum pseudolikelihood estimators for the regression parameters are shown to be consistent and asymptotically normal, with a limiting covariance matrix that can be consistently estimated by a sandwich estimator under arbitrary dependence structures for the related event times. We evaluate the performance of the proposed methods through extensive simulation studies and present an application to data from the Atherosclerosis Risk in Communities Study.






  • 文章类型: Journal Article
    The zero-inflated negative binomial distribution has been widely used for count data analyses in various biomedical settings due to its capacity of modeling excess zeros and overdispersion. When there are correlated count variables, a bivariate model is essential for understanding their full distributional features. Examples include measuring correlation of two genes in sparse single-cell RNA sequencing data and modeling dental caries count indices on two different tooth surface types. For these purposes, we develop a richly parametrized bivariate zero-inflated negative binomial model that has a simple latent variable framework and eight free parameters with intuitive interpretations. In the scRNA-seq data example, the correlation is estimated after adjusting for the effects of dropout events represented by excess zeros. In the dental caries data, we analyze how the treatment with Xylitol lozenges affects the marginal mean and other patterns of response manifested in the two dental caries traits. An R package \"bzinb\" is available on Comprehensive R Archive Network.






  • 文章类型: Journal Article
    We propose a hidden Markov model for multivariate continuous longitudinal responses with covariates that accounts for three different types of missing pattern: (I) partially missing outcomes at a given time occasion, (II) completely missing outcomes at a given time occasion (intermittent pattern), and (III) dropout before the end of the period of observation (monotone pattern). The missing-at-random (MAR) assumption is formulated to deal with the first two types of missingness, while to account for the informative dropout, we rely on an extra absorbing state. Estimation of the model parameters is based on the maximum likelihood method that is implemented by an expectation-maximization (EM) algorithm relying on suitable recursions. The proposal is illustrated by a Monte Carlo simulation study and an application based on historical data on primary biliary cholangitis.





