
  • 文章类型: Journal Article
    Multivariate interval-censored data arise when there are multiple types of events or clusters of study subjects, such that the event times are potentially correlated and when each event is only known to occur over a particular time interval. We formulate the effects of potentially time-varying covariates on the multivariate event times through marginal proportional hazards models while leaving the dependence structures of the related event times unspecified. We construct the nonparametric pseudolikelihood under the working assumption that all event times are independent, and we provide a simple and stable EM-type algorithm. The resulting nonparametric maximum pseudolikelihood estimators for the regression parameters are shown to be consistent and asymptotically normal, with a limiting covariance matrix that can be consistently estimated by a sandwich estimator under arbitrary dependence structures for the related event times. We evaluate the performance of the proposed methods through extensive simulation studies and present an application to data from the Atherosclerosis Risk in Communities Study.






  • 文章类型: Journal Article
    Studies on the health effects of environmental mixtures face the challenge of limit of detection (LOD) in multiple correlated exposure measurements. Conventional approaches to deal with covariates subject to LOD, including complete-case analysis, substitution methods, and parametric modeling of covariate distribution, are feasible but may result in efficiency loss or bias. With a single covariate subject to LOD, a flexible semiparametric accelerated failure time (AFT) model to accommodate censored measurements has been proposed. We generalize this approach by considering a multivariate AFT model for the multiple correlated covariates subject to LOD and a generalized linear model for the outcome. A two-stage procedure based on semiparametric pseudo-likelihood is proposed for estimating the effects of these covariates on health outcome. Consistency and asymptotic normality of the estimators are derived for an arbitrary fixed dimension of covariates. Simulations studies demonstrate good large sample performance of the proposed methods vs conventional methods in realistic scenarios. We illustrate the practical utility of the proposed method with the LIFECODES birth cohort data, where we compare our approach to existing approaches in an analysis of multiple urinary trace metals in association with oxidative stress in pregnant women.






  • 文章类型: Journal Article
    The case-control study design is one of the main tools for detecting associations between genetic markers and diseases. It is well known that population substructure can lead to spurious association between disease status and a genetic marker if the prevalence of disease and the marker allele frequency vary across subpopulations. In this paper, we propose a novel statistical method to estimate the association in case-control studies with unmeasured population substructure. The proposed method takes two steps. First, the information on genomic markers and disease status is used to infer the population substructure; second, the association between the disease and the test marker adjusting for the population substructure is modeled and estimated parametrically through polytomous logistic regression. The performance of the proposed method, relative to the existing methods, on bias, coverage probability and computational time, is assessed through simulations. The method is applied to an end-stage renal disease study in African Americans population.






  • 文章类型: Journal Article
    This paper deals with the issue of nonparametric estimation of the transition probability matrix of a non-homogeneous Markov process with finite state space and partially observed absorbing state. We impose a missing at random assumption and propose a computationally efficient nonparametric maximum pseudolikelihood estimator (NPMPLE). The estimator depends on a parametric model that is used to estimate the probability of each absorbing state for the missing observations based, potentially, on auxiliary data. For the latter model we propose a formal goodness-of-fit test based on a residual process. Using modern empirical process theory we show that the estimator is uniformly consistent and converges weakly to a tight mean-zero Gaussian random field. We also provide methodology for simultaneous confidence band construction. Simulation studies show that the NPMPLE works well with small sample sizes and that it is robust against some degree of misspecification of the parametric model for the missing absorbing states. The method is illustrated using HIV data from sub-Saharan Africa to estimate the transition probabilities of death and disengagement from HIV care.






  • 文章类型: Journal Article
    Standard methods for two-sample tests such as the t-test and Wilcoxon rank sum test may lead to incorrect type I errors when applied to longitudinal or clustered data. Recent alternatives of two-sample tests for clustered data often require certain assumptions on the correlation structure and/or noninformative cluster size. In this paper, based on a novel pseudolikelihood for correlated data, we propose a score test without knowledge of the correlation structure or assuming data missingness at random. The proposed score test can capture differences in the mean and variance between two groups simultaneously. We use projection theory to derive the limiting distribution of the test statistic, in which the covariance matrix can be empirically estimated. We conduct simulation studies to evaluate the proposed test and compare it with existing methods. To illustrate the usefulness proposed test, we use it to compare self-reported weight loss data in a friends\' referral group, with the data from the Internet self-joining group.







  • 文章类型: Journal Article
    In biomedical cohort studies for assessing the association between an outcome variable and a set of covariates, usually, some covariates can only be measured on a subgroup of study subjects. An important design question is-which subjects to select into the subgroup to increase statistical efficiency. When the outcome is binary, one may adopt a case-control sampling design or a balanced case-control design where cases and controls are further matched on a small number of complete discrete covariates. While the latter achieves success in estimating odds ratio (OR) parameters for the matching covariates, similar two-phase design options have not been explored for the remaining covariates, especially the incompletely collected ones. This is of great importance in studies where the covariates of interest cannot be completely collected. To this end, assuming that an external model is available to relate the outcome and complete covariates, we propose a novel sampling scheme that oversamples cases and controls with worse goodness-of-fit based on the external model and further matches them on complete covariates similarly to the balanced design. We develop a pseudolikelihood method for estimating OR parameters. Through simulation studies and explorations in a real-cohort study, we find that our design generally leads to reduced asymptotic variances of the OR estimates and the reduction for the matching covariates is comparable to that of the balanced design.







  • 文章类型: Journal Article
    Multivariate frailty models have been used for clustered survival data to characterize the relationship between the hazard of correlated failures/events and exposure variables and covariates. However, these models can introduce serious biases of the estimation for failures from complex surveys that may depend on the sampling design (informative or noninformative). In order to consistently estimate parameters, this paper considers weighting the multivariate frailty model by the inverse of the probability of selection at each stage of sampling. This follows the principle of the pseudolikelihood approach. The estimation is carried out by maximizing the penalized partial and marginal pseudolikelihood functions. The performance of the proposed estimator is assessed through a Monte Carlo simulation study and the 4 waves of data from the 1998-1999 Early Childhood Longitudinal Study. Results show that the weighted estimator is consistent and approximately unbiased.






  • 文章类型: Journal Article
    Genome-wide association studies (GWAS) often measure gene-environment interactions (G × E). We consider the problem of accurately estimating a G × E in a case-control GWAS when a subset of the controls have silent, or undiagnosed, disease and the frequency of the silent disease varies by the environmental variable. We show that using case-control status without accounting for misdiagnosis can lead to biased estimates of the G × E. We further propose a pseudolikelihood approach to remove the bias and accurately estimate how the relationship between the genetic variant and the true disease status varies by the environmental variable. We demonstrate our method in extensive simulations and apply our method to a GWAS of prostate cancer.







  • 文章类型: Journal Article
    Many methods have recently been proposed for efficient analysis of case-control studies of gene-environment interactions using a retrospective likelihood framework that exploits the natural assumption of gene-environment independence in the underlying population. However, for polygenic modelling of gene-environment interactions, which is a topic of increasing scientific interest, applications of retrospective methods have been limited due to a requirement in the literature for parametric modelling of the distribution of the genetic factors. We propose a general, computationally simple, semiparametric method for analysis of case-control studies that allows exploitation of the assumption of gene-environment independence without any further parametric modelling assumptions about the marginal distributions of any of the two sets of factors. The method relies on the key observation that an underlying efficient profile likelihood depends on the distribution of genetic factors only through certain expectation terms that can be evaluated empirically. We develop asymptotic inferential theory for the estimator and evaluate its numerical performance via simulation studies. An application of the method is presented.







  • 文章类型: Journal Article
    Consider a semiparametric model indexed by a Euclidean parameter of interest and an infinite-dimensional nuisance parameter. In many applications, pseudolikelihood provides a convenient way to infer the parameter of interest, where the nuisance parameter is replaced by a consistent estimator. The purpose of this paper is to establish the asymptotic behaviour of the pseudolikelihood ratio statistic under semiparametric models. In particular, we consider testing the hypothesis that the parameter of interest lies on the boundary of its parameter space. Under regularity conditions, we establish the equivalence between the asymptotic distributions of the pseudolikelihood ratio statistic and a likelihood ratio statistic for a normal mean problem with a misspecified covariance matrix. This result holds when the nuisance parameter is estimated at a rate slower than the usual rate in parametric models. We study three examples in which the asymptotic distributions are shown to be mixtures of chi-squared variables. We conduct simulation studies to examine the finite-sample performance of the pseudolikelihood ratio test.





