Pseudolikelihood

伪可能性
  • 文章类型: Journal Article
    当存在多种类型的事件或研究受试者集群时,会出现多变量间隔删失数据。使得事件时间潜在地相关,并且当每个事件仅已知在特定时间间隔内发生时。我们通过边际比例风险模型制定了潜在时变协变量对多变量事件时间的影响,同时未指定相关事件时间的依赖结构。我们在所有事件时间都是独立的工作假设下构造了非参数伪似然,我们提供了一个简单而稳定的EM型算法。所得到的回归参数的非参数最大伪似然估计量显示为一致且渐近正态,具有极限协方差矩阵,该矩阵可以在相关事件时间的任意依赖结构下通过三明治估计器进行一致估计。我们通过广泛的模拟研究来评估所提出方法的性能,并将其应用于社区动脉粥样硬化风险研究的数据。
    Multivariate interval-censored data arise when there are multiple types of events or clusters of study subjects, such that the event times are potentially correlated and when each event is only known to occur over a particular time interval. We formulate the effects of potentially time-varying covariates on the multivariate event times through marginal proportional hazards models while leaving the dependence structures of the related event times unspecified. We construct the nonparametric pseudolikelihood under the working assumption that all event times are independent, and we provide a simple and stable EM-type algorithm. The resulting nonparametric maximum pseudolikelihood estimators for the regression parameters are shown to be consistent and asymptotically normal, with a limiting covariance matrix that can be consistently estimated by a sandwich estimator under arbitrary dependence structures for the related event times. We evaluate the performance of the proposed methods through extensive simulation studies and present an application to data from the Atherosclerosis Risk in Communities Study.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    对环境混合物的健康影响的研究面临多个相关暴露测量中的检测限(LOD)的挑战。处理服从LOD的协变量的常规方法,包括完整案例分析,替代方法,和协变量分布的参数化建模,是可行的,但可能导致效率损失或偏差。对于服从LOD的单个协变量,提出了一种灵活的半参数加速故障时间(AFT)模型,以适应删失测量。我们通过考虑服从LOD的多个相关协变量的多变量AFT模型和结果的广义线性模型来推广这种方法。提出了一种基于半参数伪似然的两阶段程序,用于估计这些协变量对健康结果的影响。对于协变量的任意固定维数,可以得出估计量的一致性和渐近正态。仿真研究表明,在现实场景中,所提出的方法与常规方法相比具有良好的大样本性能。我们用LIFECODES出生队列数据说明了所提出方法的实用性,在分析与孕妇氧化应激相关的多种尿微量金属时,我们将我们的方法与现有方法进行了比较。
    Studies on the health effects of environmental mixtures face the challenge of limit of detection (LOD) in multiple correlated exposure measurements. Conventional approaches to deal with covariates subject to LOD, including complete-case analysis, substitution methods, and parametric modeling of covariate distribution, are feasible but may result in efficiency loss or bias. With a single covariate subject to LOD, a flexible semiparametric accelerated failure time (AFT) model to accommodate censored measurements has been proposed. We generalize this approach by considering a multivariate AFT model for the multiple correlated covariates subject to LOD and a generalized linear model for the outcome. A two-stage procedure based on semiparametric pseudo-likelihood is proposed for estimating the effects of these covariates on health outcome. Consistency and asymptotic normality of the estimators are derived for an arbitrary fixed dimension of covariates. Simulations studies demonstrate good large sample performance of the proposed methods vs conventional methods in realistic scenarios. We illustrate the practical utility of the proposed method with the LIFECODES birth cohort data, where we compare our approach to existing approaches in an analysis of multiple urinary trace metals in association with oxidative stress in pregnant women.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    病例对照研究设计是检测遗传标记与疾病之间关联的主要工具之一。众所周知,如果疾病的患病率和标记等位基因频率在亚群中有所不同,则种群亚结构可能导致疾病状态与遗传标记之间的虚假关联。在本文中,我们提出了一种新的统计方法来估计病例对照研究与未测量群体子结构的相关性.所提出的方法需要两个步骤。首先,基因组标记和疾病状态的信息用于推断种群子结构;第二,通过多因素logistic回归对疾病与调整种群亚结构的测试标记之间的关联进行建模和参数估计.所提出的方法的性能,相对于现有的方法,在偏见上,覆盖概率和计算时间,是通过模拟评估的。该方法适用于非裔美国人人群的终末期肾病研究。
    The case-control study design is one of the main tools for detecting associations between genetic markers and diseases. It is well known that population substructure can lead to spurious association between disease status and a genetic marker if the prevalence of disease and the marker allele frequency vary across subpopulations. In this paper, we propose a novel statistical method to estimate the association in case-control studies with unmeasured population substructure. The proposed method takes two steps. First, the information on genomic markers and disease status is used to infer the population substructure; second, the association between the disease and the test marker adjusting for the population substructure is modeled and estimated parametrically through polytomous logistic regression. The performance of the proposed method, relative to the existing methods, on bias, coverage probability and computational time, is assessed through simulations. The method is applied to an end-stage renal disease study in African Americans population.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

  • 文章类型: Journal Article
    This paper deals with the issue of nonparametric estimation of the transition probability matrix of a non-homogeneous Markov process with finite state space and partially observed absorbing state. We impose a missing at random assumption and propose a computationally efficient nonparametric maximum pseudolikelihood estimator (NPMPLE). The estimator depends on a parametric model that is used to estimate the probability of each absorbing state for the missing observations based, potentially, on auxiliary data. For the latter model we propose a formal goodness-of-fit test based on a residual process. Using modern empirical process theory we show that the estimator is uniformly consistent and converges weakly to a tight mean-zero Gaussian random field. We also provide methodology for simultaneous confidence band construction. Simulation studies show that the NPMPLE works well with small sample sizes and that it is robust against some degree of misspecification of the parametric model for the missing absorbing states. The method is illustrated using HIV data from sub-Saharan Africa to estimate the transition probabilities of death and disengagement from HIV care.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    双样本检验的标准方法,如t检验和Wilcoxon秩和检验,当应用于纵向或聚类数据时,可能会导致不正确的I型错误。最近针对聚类数据的双样本测试的替代方法通常需要对相关结构和/或非信息性聚类大小进行某些假设。在本文中,基于相关数据的一种新的伪似然,我们提出了一个分数测试,不知道相关结构或随机假设数据错误。所提出的分数测试可以同时捕获两组之间的均值和方差的差异。我们使用投影理论来推导检验统计量的极限分布,其中协方差矩阵可以根据经验估计。我们进行了仿真研究,以评估所提出的测试,并将其与现有方法进行比较。为了说明所提出的测试的有用性,我们用它来比较朋友推荐组中自我报告的减肥数据,与来自互联网自加入组的数据。
    Standard methods for two-sample tests such as the t-test and Wilcoxon rank sum test may lead to incorrect type I errors when applied to longitudinal or clustered data. Recent alternatives of two-sample tests for clustered data often require certain assumptions on the correlation structure and/or noninformative cluster size. In this paper, based on a novel pseudolikelihood for correlated data, we propose a score test without knowledge of the correlation structure or assuming data missingness at random. The proposed score test can capture differences in the mean and variance between two groups simultaneously. We use projection theory to derive the limiting distribution of the test statistic, in which the covariance matrix can be empirically estimated. We conduct simulation studies to evaluate the proposed test and compare it with existing methods. To illustrate the usefulness proposed test, we use it to compare self-reported weight loss data in a friends\' referral group, with the data from the Internet self-joining group.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    在评估结果变量和一组协变量之间关联的生物医学队列研究中,通常,一些协变量只能在一组研究对象上测量。一个重要的设计问题是-选择哪些受试者进入子组以提高统计效率。当结果是二进制时,可以采用案例控制采样设计或平衡案例控制设计,其中案例和控制在少量完整的离散协变量上进一步匹配。虽然后者在估计匹配协变量的优势比(OR)参数方面取得了成功,对于剩余的协变量,还没有探索类似的两阶段设计选项,尤其是不完全收集的。这在不能完全收集感兴趣的协变量的研究中非常重要。为此,假设外部模型可用于将结果和完整协变量联系起来,我们提出了一种新颖的抽样方案,该方案基于外部模型对拟合优度较差的案例和对照进行超抽样,并与平衡设计类似地在完整协变量上进一步匹配它们.我们开发了一种用于估计或参数的伪似然方法。通过真实队列研究中的模拟研究和探索,我们发现,我们的设计通常会导致OR估计的渐近方差减小,并且匹配协变量的减小与平衡设计的减小相当。
    In biomedical cohort studies for assessing the association between an outcome variable and a set of covariates, usually, some covariates can only be measured on a subgroup of study subjects. An important design question is-which subjects to select into the subgroup to increase statistical efficiency. When the outcome is binary, one may adopt a case-control sampling design or a balanced case-control design where cases and controls are further matched on a small number of complete discrete covariates. While the latter achieves success in estimating odds ratio (OR) parameters for the matching covariates, similar two-phase design options have not been explored for the remaining covariates, especially the incompletely collected ones. This is of great importance in studies where the covariates of interest cannot be completely collected. To this end, assuming that an external model is available to relate the outcome and complete covariates, we propose a novel sampling scheme that oversamples cases and controls with worse goodness-of-fit based on the external model and further matches them on complete covariates similarly to the balanced design. We develop a pseudolikelihood method for estimating OR parameters. Through simulation studies and explorations in a real-cohort study, we find that our design generally leads to reduced asymptotic variances of the OR estimates and the reduction for the matching covariates is comparable to that of the balanced design.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    多变量脆弱模型已用于聚类生存数据,以表征相关故障/事件的危害与暴露变量和协变量之间的关系。然而,这些模型可能会导致对复杂调查失败的估计存在严重偏差,这可能取决于抽样设计(提供信息或非提供信息)。为了始终如一地估计参数,本文考虑了在采样的每个阶段通过选择概率的倒数对多变量脆弱模型进行加权。这遵循伪似然方法的原理。通过最大化惩罚的部分和边际伪似然函数来进行估计。通过蒙特卡洛模拟研究和1998-1999年幼儿纵向研究的4波数据评估了拟议估计器的性能。结果表明,加权估计是一致的,近似无偏。
    Multivariate frailty models have been used for clustered survival data to characterize the relationship between the hazard of correlated failures/events and exposure variables and covariates. However, these models can introduce serious biases of the estimation for failures from complex surveys that may depend on the sampling design (informative or noninformative). In order to consistently estimate parameters, this paper considers weighting the multivariate frailty model by the inverse of the probability of selection at each stage of sampling. This follows the principle of the pseudolikelihood approach. The estimation is carried out by maximizing the penalized partial and marginal pseudolikelihood functions. The performance of the proposed estimator is assessed through a Monte Carlo simulation study and the 4 waves of data from the 1998-1999 Early Childhood Longitudinal Study. Results show that the weighted estimator is consistent and approximately unbiased.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

  • 文章类型: Journal Article
    全基因组关联研究(GWAS)通常测量基因-环境相互作用(G×E)。我们考虑当控制子集沉默时,在病例对照GWAS中准确估计G×E的问题,或未确诊,疾病和沉默疾病的频率因环境变量而异。我们表明,在不考虑误诊的情况下使用病例对照状态会导致对G×E的估计有偏差。我们进一步提出了一种伪似然方法来消除偏差,并准确估计遗传变异与真实疾病状态之间的关系如何随环境变量而变化。我们在广泛的模拟中展示了我们的方法,并将我们的方法应用于前列腺癌的GWAS。
    Genome-wide association studies (GWAS) often measure gene-environment interactions (G × E). We consider the problem of accurately estimating a G × E in a case-control GWAS when a subset of the controls have silent, or undiagnosed, disease and the frequency of the silent disease varies by the environmental variable. We show that using case-control status without accounting for misdiagnosis can lead to biased estimates of the G × E. We further propose a pseudolikelihood approach to remove the bias and accurately estimate how the relationship between the genetic variant and the true disease status varies by the environmental variable. We demonstrate our method in extensive simulations and apply our method to a GWAS of prostate cancer.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    最近提出了许多方法,用于使用回顾性可能性框架对基因-环境相互作用的病例对照研究进行有效分析,该框架利用了潜在人群中基因-环境独立性的自然假设。然而,对于基因-环境相互作用的多基因建模,这是一个日益引起科学兴趣的话题,由于文献中对遗传因素分布的参数建模的要求,回顾性方法的应用受到限制。我们提议一个将军,计算简单,用于分析病例对照研究的半参数方法,该方法允许利用基因环境独立性的假设,而无需对两组因素中任何一组的边际分布进行任何进一步的参数建模假设。该方法依赖于关键观察,即潜在的有效轮廓可能性仅通过可以凭经验评估的某些期望项取决于遗传因素的分布。我们为估计器开发了渐近推理理论,并通过仿真研究评估了其数值性能。介绍了该方法的应用。
    Many methods have recently been proposed for efficient analysis of case-control studies of gene-environment interactions using a retrospective likelihood framework that exploits the natural assumption of gene-environment independence in the underlying population. However, for polygenic modelling of gene-environment interactions, which is a topic of increasing scientific interest, applications of retrospective methods have been limited due to a requirement in the literature for parametric modelling of the distribution of the genetic factors. We propose a general, computationally simple, semiparametric method for analysis of case-control studies that allows exploitation of the assumption of gene-environment independence without any further parametric modelling assumptions about the marginal distributions of any of the two sets of factors. The method relies on the key observation that an underlying efficient profile likelihood depends on the distribution of genetic factors only through certain expectation terms that can be evaluated empirically. We develop asymptotic inferential theory for the estimator and evaluate its numerical performance via simulation studies. An application of the method is presented.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    Consider a semiparametric model indexed by a Euclidean parameter of interest and an infinite-dimensional nuisance parameter. In many applications, pseudolikelihood provides a convenient way to infer the parameter of interest, where the nuisance parameter is replaced by a consistent estimator. The purpose of this paper is to establish the asymptotic behaviour of the pseudolikelihood ratio statistic under semiparametric models. In particular, we consider testing the hypothesis that the parameter of interest lies on the boundary of its parameter space. Under regularity conditions, we establish the equivalence between the asymptotic distributions of the pseudolikelihood ratio statistic and a likelihood ratio statistic for a normal mean problem with a misspecified covariance matrix. This result holds when the nuisance parameter is estimated at a rate slower than the usual rate in parametric models. We study three examples in which the asymptotic distributions are shown to be mixtures of chi-squared variables. We conduct simulation studies to examine the finite-sample performance of the pseudolikelihood ratio test.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

公众号