Expectation-maximization algorithm

期望最大化算法
  • 文章类型: Journal Article
    概率回归模型的混合是将协变量信息纳入群体异质性学习的最常见技术之一。尽管具有灵活性,由于协变量之间的多重共线性,可能会出现不可靠的估计。在本文中,我们通过无监督学习方法开发了Liu型收缩方法,以在存在多重共线性的情况下估计模型系数。我们通过期望最大化算法的分类和随机版本来评估我们提出的方法的性能。我们使用数值模拟表明,所提出的方法优于其Ridge和最大似然对应物。最后,我们应用我们的方法分析50岁及以上女性的骨矿物质数据。
    The mixture of probabilistic regression models is one of the most common techniques to incorporate the information of covariates into learning of the population heterogeneity. Despite its flexibility, unreliable estimates can occur due to multicollinearity among covariates. In this paper, we develop Liu-type shrinkage methods through an unsupervised learning approach to estimate the model coefficients in the presence of multicollinearity. We evaluate the performance of our proposed methods via classification and stochastic versions of the expectation-maximization algorithm. We show using numerical simulations that the proposed methods outperform their Ridge and maximum likelihood counterparts. Finally, we apply our methods to analyze the bone mineral data of women aged 50 and older.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    间隔审查失败时间数据经常出现在各种科学研究中,每个受试者都会定期对感兴趣的失败事件的发生进行检查,并且故障时间仅在特定的时间间隔内。此外,收集的数据可能包括多个具有一定程度相关性的观测变量,导致严重的多重共线性问题。这项工作提出了一种因子增强转换模型来分析间隔删失故障时间数据,同时降低了模型维数并避免了多个相关协变量引起的多重共线性。我们提供了一个联合建模框架,包括一个因素分析模型,将多个观察到的变量分组为几个潜在因素,以及一类带有增强因素的半参数转换模型,以检查它们和其他协变量对故障事件的影响。此外,我们提出了一种非参数最大似然估计方法,并开发了一种计算稳定且可靠的期望最大化算法。我们建立了所提出的估计器的渐近性质,并进行了仿真研究以评估所提出方法的经验性能。提供了阿尔茨海默病神经影像学计划(ADNI)研究的应用。一个R包ICTransCFA也可用于从业人员。本文制备中使用的数据来自ADNI数据库。
    Interval-censored failure time data frequently arise in various scientific studies where each subject experiences periodical examinations for the occurrence of the failure event of interest, and the failure time is only known to lie in a specific time interval. In addition, collected data may include multiple observed variables with a certain degree of correlation, leading to severe multicollinearity issues. This work proposes a factor-augmented transformation model to analyze interval-censored failure time data while reducing model dimensionality and avoiding multicollinearity elicited by multiple correlated covariates. We provide a joint modeling framework by comprising a factor analysis model to group multiple observed variables into a few latent factors and a class of semiparametric transformation models with the augmented factors to examine their and other covariate effects on the failure event. Furthermore, we propose a nonparametric maximum likelihood estimation approach and develop a computationally stable and reliable expectation-maximization algorithm for its implementation. We establish the asymptotic properties of the proposed estimators and conduct simulation studies to assess the empirical performance of the proposed method. An application to the Alzheimer\'s Disease Neuroimaging Initiative (ADNI) study is provided. An R package ICTransCFA is also available for practitioners. Data used in preparation of this article were obtained from the ADNI database.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    城市轨道交通个体轨迹的时空信息对于运营策略调整具有重要意义,个性化推荐,和应急指挥决策。然而,由于缺乏旅程观察,仅基于AFC和AVL数据很难从轨迹中准确推断未知信息。为了解决这个问题,本文提出了一种基于自适应期望最大化注意力的时空概率图模型(STPGM-AEMA),以实现个体轨迹的重构。该方法包括三个步骤:首先,通过数据挖掘和组合枚举获得个体的潜在列车备选集和出口时间备选集。然后,引入全局和局部潜在变量来构建时空概率图模型,提供未知事件的推理过程,以及关于个体轨迹的状态信息。Further,考虑到数据缺失的影响,提出了一种注意力机制增强的期望最大化算法来实现个体轨迹的最大似然估计。最后,典型的起点-目的地对数据集和实际个体轨迹跟踪数据用于验证所提出的方法的有效性。结果表明,STPGM-AEMA方法在恢复观测数据中缺失信息的准确率超过95%,这比传统方法至少准确15%(即,PTAM-MLE和MPTAM-EM)。
    Spatiotemporal information on individual trajectories in urban rail transit is important for operational strategy adjustment, personalized recommendation, and emergency command decision-making. However, due to the lack of journey observations, it is difficult to accurately infer unknown information from trajectories based only on AFC and AVL data. To address the problem, this paper proposes a spatiotemporal probabilistic graphical model based on adaptive expectation maximization attention (STPGM-AEMA) to achieve the reconstruction of individual trajectories. The approach consists of three steps: first, the potential train alternative set and the egress time alternative set of individuals are obtained through data mining and combinatorial enumeration. Then, global and local potential variables are introduced to construct a spatiotemporal probabilistic graphical model, provide the inference process for unknown events, and state information about individual trajectories. Further, considering the effect of missing data, an attention mechanism-enhanced expectation-maximization algorithm is proposed to achieve maximum likelihood estimation of individual trajectories. Finally, typical datasets of origin-destination pairs and actual individual trajectory tracking data are used to validate the effectiveness of the proposed method. The results show that the STPGM-AEMA method is more than 95% accurate in recovering missing information in the observed data, which is at least 15% more accurate than the traditional methods (i.e., PTAM-MLE and MPTAM-EM).
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    多维项目反应理论(MIRT)模型在心理计量学文献中引起了越来越多的兴趣。已经开发了估计具有二分响应的MIRT模型的有效方法,但是为多模型构建同样有效和鲁棒的算法却受到了有限的关注。为了解决这个差距,针对多维广义部分信用模型,提出了一种新的高斯变分估计算法。所提出的算法显示了快速和准确的性能,通过一系列的模拟研究和两个真实的数据分析来说明。
    Multidimensional item response theory (MIRT) models have generated increasing interest in the psychometrics literature. Efficient approaches for estimating MIRT models with dichotomous responses have been developed, but constructing an equally efficient and robust algorithm for polytomous models has received limited attention. To address this gap, this paper presents a novel Gaussian variational estimation algorithm for the multidimensional generalized partial credit model. The proposed algorithm demonstrates both fast and accurate performance, as illustrated through a series of simulation studies and two real data analyses.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    目的:提出了一种自适应谐波分离(HS)技术,以克服用于超声(US)组织谐波成像(THI)的常规滤波技术中的局限性。
    方法:基于期望最大化源分离,所提出的HS技术在频域中自适应地对深度变化的基波和谐波分量进行建模,并通过应用计算出的后验概率来分离两者。进行Tx中心频率为2MHz的幻影实验,以评估提出的基于HS的USTHI方案。
    结果:体模图像表明,与常规带通滤波方案相比,所提出的利用HS技术的单脉冲THI方案不仅在轴向分辨率上平均提高了19.2%,而且与常规脉冲反转(PI)方案的图像质量相似,该方案需要每个扫描线两个Tx/Rx序列。此外,当与PI技术相结合时,HS技术在整个170mm成像深度上提供均匀的轴向分辨率,与常规PI方案相比具有17.1%的平均改进。
    结论:这些结果表明,所提出的自适应HS技术能够提高USTHI的帧速率和图像质量。
    An adaptive harmonic separation (HS) technique is proposed to overcome the limitations in conventional filtering techniques for ultrasound (US) tissue harmonic imaging (THI).
    Based on expectation-maximization source separation, the proposed HS technique adaptively models the depth-varying fundamental and harmonic components in the frequency domain and separates the two by applying their calculated posterior probabilities. Phantom experiments with a Tx center frequency of 2 MHz are conducted to evaluate the proposed HS-based US THI schemes.
    The phantom images show that the proposed single-pulse THI scheme utilizing the HS technique provides not only an average improvement of 19.2% in axial resolution compared to the conventional bandpass filtering scheme but also similar image quality to that of the conventional pulse-inversion (PI) scheme which requires two Tx/Rx sequences for each scan line. Furthermore, when combined with the PI technique, the HS technique provides a uniform axial resolution over the entire 170 mm imaging depth with an average improvement of 17.1% compared to the conventional PI scheme.
    These results show that the proposed adaptive HS technique is capable of improving both the frame rate and the image quality of US THI.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    在癌症研究中,参与研究的一小部分患者被治愈是司空见惯的,这样并不是所有的人都会经历复发,或因癌症而死亡。此外,一些协变量似乎是合理的,例如分配给患者的治疗或人口统计学特征,可能会影响患者的生存率和治愈/发病率。在生存分析中适应这些特征的常见方法是考虑混合治愈生存模型,其发生率由逻辑回归模型建模,潜伏期部分由Cox比例风险模型建模。这些建模假设,虽然很典型,限制协变量效应对发生率和潜伏期分量的结构。作为获得灵活性的合理途径,本文研究了一类半参数混合固化模型,它结合了两个单指数函数来对两个回归分量进行建模。提出了一种混合非参数极大似然估计方法,其中非参数估计未治愈受试者的累积基线风险函数,并通过伯恩斯坦多项式估计两个单指数函数。参数估计是通过精选的期望最大化算法进行的。我们还进行了大规模的模拟研究,以评估估计器的有限样本性能。通过对两个癌症数据集的应用来说明所提出的方法。
    In cancer studies, it is commonplace that a fraction of patients participating in the study are cured, such that not all of them will experience a recurrence, or death due to cancer. Also, it is plausible that some covariates, such as the treatment assigned to the patients or demographic characteristics, could affect both the patients\' survival rates and cure/incidence rates. A common approach to accommodate these features in survival analysis is to consider a mixture cure survival model with the incidence rate modeled by a logistic regression model and latency part modeled by the Cox proportional hazards model. These modeling assumptions, though typical, restrict the structure of covariate effects on both the incidence and latency components. As a plausible recourse to attain flexibility, we study a class of semiparametric mixture cure models in this article, which incorporates two single-index functions for modeling the two regression components. A hybrid nonparametric maximum likelihood estimation method is proposed, where the cumulative baseline hazard function for uncured subjects is estimated nonparametrically, and the two single-index functions are estimated via Bernstein polynomials. Parameter estimation is carried out via a curated expectation-maximization algorithm. We also conducted a large-scale simulation study to assess the finite-sample performance of the estimator. The proposed methodology is illustrated via application to two cancer datasets.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    当二进制响应变量包含超过零计数时,数据不平衡。不平衡数据会给二元分类带来麻烦。为了简化数值计算,以获得具有不平衡数据的零膨胀伯努利(ZIBer)模型参数的最大似然估计,提出了一种期望最大化(EM)算法来推导模型参数的最大似然估计。逻辑回归模型将伯努利概率与ZIBer模型中的协变量联系起来,以及ZIBer模型之间的预测性能,LightGBM,通过蒙特卡罗模拟对人工神经网络(ANN)程序进行了比较。结果表明,在不平衡数据下,没有任何方法可以支配其他方法的预测性能。对于零膨胀不平衡数据集,LightGBM和ZIBer模型比ANN模型更具竞争力。
    When the binary response variable contains an excess of zero counts, the data are imbalanced. Imbalanced data cause trouble for binary classification. To simplify the numerical computation to obtain the maximum likelihood estimators of the zero-inflated Bernoulli (ZIBer) model parameters with imbalanced data, an expectation-maximization (EM) algorithm is proposed to derive the maximum likelihood estimates of the model parameters. The logistic regression model links the Bernoulli probabilities with the covariates in the ZIBer model, and the prediction performance among the ZIBer model, LightGBM, and artificial neural network (ANN) procedures is compared by Monte Carlo simulation. The results show that no method can dominate the other methods regarding predictive performance under the imbalanced data. The LightGBM and ZIBer models are more competitive than the ANN model for zero-inflated-imbalanced data sets.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    当存在多种类型的事件或研究受试者集群时,会出现多变量间隔删失数据。使得事件时间潜在地相关,并且当每个事件仅已知在特定时间间隔内发生时。我们通过边际比例风险模型制定了潜在时变协变量对多变量事件时间的影响,同时未指定相关事件时间的依赖结构。我们在所有事件时间都是独立的工作假设下构造了非参数伪似然,我们提供了一个简单而稳定的EM型算法。所得到的回归参数的非参数最大伪似然估计量显示为一致且渐近正态,具有极限协方差矩阵,该矩阵可以在相关事件时间的任意依赖结构下通过三明治估计器进行一致估计。我们通过广泛的模拟研究来评估所提出方法的性能,并将其应用于社区动脉粥样硬化风险研究的数据。
    Multivariate interval-censored data arise when there are multiple types of events or clusters of study subjects, such that the event times are potentially correlated and when each event is only known to occur over a particular time interval. We formulate the effects of potentially time-varying covariates on the multivariate event times through marginal proportional hazards models while leaving the dependence structures of the related event times unspecified. We construct the nonparametric pseudolikelihood under the working assumption that all event times are independent, and we provide a simple and stable EM-type algorithm. The resulting nonparametric maximum pseudolikelihood estimators for the regression parameters are shown to be consistent and asymptotically normal, with a limiting covariance matrix that can be consistently estimated by a sandwich estimator under arbitrary dependence structures for the related event times. We evaluate the performance of the proposed methods through extensive simulation studies and present an application to data from the Atherosclerosis Risk in Communities Study.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    由于零膨胀的负二项分布具有建模过零和过分散的能力,因此已广泛用于各种生物医学环境中的计数数据分析。当有相关计数变量时,双变量模型对于理解它们的完整分布特征至关重要。示例包括测量稀疏单细胞RNA测序数据中两个基因的相关性,并在两种不同的牙齿表面类型上建模龋齿计数指数。出于这些目的,我们开发了一个丰富的参数化的双变量零膨胀负二项模型,该模型具有简单的潜在变量框架和八个具有直观解释的自由参数。在scRNA-seq数据示例中,在调整由过量零表示的丢失事件的影响之后估计相关性。在龋齿数据中,我们分析了木糖醇锭剂治疗如何影响两种龋齿性状的边缘均值和其他反应模式。综合R存档网络上提供了R包“bzinb”。
    The zero-inflated negative binomial distribution has been widely used for count data analyses in various biomedical settings due to its capacity of modeling excess zeros and overdispersion. When there are correlated count variables, a bivariate model is essential for understanding their full distributional features. Examples include measuring correlation of two genes in sparse single-cell RNA sequencing data and modeling dental caries count indices on two different tooth surface types. For these purposes, we develop a richly parametrized bivariate zero-inflated negative binomial model that has a simple latent variable framework and eight free parameters with intuitive interpretations. In the scRNA-seq data example, the correlation is estimated after adjusting for the effects of dropout events represented by excess zeros. In the dental caries data, we analyze how the treatment with Xylitol lozenges affects the marginal mean and other patterns of response manifested in the two dental caries traits. An R package \"bzinb\" is available on Comprehensive R Archive Network.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    我们提出了一种具有协变量的多变量连续纵向响应的隐马尔可夫模型,该模型解释了三种不同类型的缺失模式:(I)在给定时间范围内部分缺失结果,(II)在给定时间场合完全缺失结果(间歇性模式),和(III)在观察期结束前退出(单调模式)。随机缺失(MAR)假设是为了处理前两种类型的缺失,虽然考虑到信息丰富的辍学,我们依靠一个额外的吸收状态。模型参数的估计基于最大似然方法,该方法由依赖于适当递归的期望最大化(EM)算法实现。蒙特卡罗模拟研究和基于原发性胆汁性胆管炎历史数据的应用说明了该建议。
    We propose a hidden Markov model for multivariate continuous longitudinal responses with covariates that accounts for three different types of missing pattern: (I) partially missing outcomes at a given time occasion, (II) completely missing outcomes at a given time occasion (intermittent pattern), and (III) dropout before the end of the period of observation (monotone pattern). The missing-at-random (MAR) assumption is formulated to deal with the first two types of missingness, while to account for the informative dropout, we rely on an extra absorbing state. Estimation of the model parameters is based on the maximum likelihood method that is implemented by an expectation-maximization (EM) algorithm relying on suitable recursions. The proposal is illustrated by a Monte Carlo simulation study and an application based on historical data on primary biliary cholangitis.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

公众号