multiple testing

多重测试
  • 文章类型: Journal Article
    表达数量性状基因座(eQTL)分析是鉴定与基因表达水平相关的遗传基因座的有用工具。诸如基因型-组织表达(GTEx)项目之类的大型协作努力为不同组织中的eQTL分析提供了宝贵的资源。大多数现有的方法,然而,要么一次集中在一个组织上,或分析多个组织以鉴定联合存在于多个组织中的eQTL。缺乏有效的方法来识别靶组织中的eQTL,同时有效地借用辅助组织的强度。在本文中,我们提出了一种新的统计框架,利用来自其他组织的辅助信息来提高感兴趣组织中的eQTL检测效率。该框架可以通过将来自多个组织的共享和特定效应纳入测试统计来增强对eQTL效应的假设检验的能力。我们还设计了数据驱动和分布式计算方法,以在组织数量大时有效实现eQTL检测。模拟中的数值研究证明了所提出方法的有效性,GTEx实例的真实数据分析提供了对不同组织中eQTL发现的新见解。
    Expression quantitative trait locus (eQTL) analysis is a useful tool to identify genetic loci that are associated with gene expression levels. Large collaborative efforts such as the Genotype-Tissue Expression (GTEx) project provide valuable resources for eQTL analysis in different tissues. Most existing methods, however, either focus on one tissue at a time, or analyze multiple tissues to identify eQTLs jointly present in multiple tissues. There is a lack of powerful methods to identify eQTLs in a target tissue while effectively borrowing strength from auxiliary tissues. In this paper, we propose a novel statistical framework to improve the eQTL detection efficacy in the tissue of interest with auxiliary information from other tissues. This framework can enhance the power of the hypothesis test for eQTL effects by incorporating shared and specific effects from multiple tissues into the test statistics. We also devise data-driven and distributed computing approaches for efficient implementation of eQTL detection when the number of tissues is large. Numerical studies in simulation demonstrate the efficacy of the proposed method, and the real data analysis of the GTEx example provides novel insights into eQTL findings in different tissues.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    研究了高维高斯图形模型(GGMs)的迁移学习。目标GGM是通过纳入来自类似和相关辅助研究的数据来估计的,其中,目标图和每个辅助图之间的相似性由散度矩阵的稀疏性表征。一种估计算法,跨气候,提出并显示在单任务设置中获得比minimax速率更快的收敛速率。此外,我们介绍了一种通用的去偏置方法,该方法可以与一系列初始图估计器耦合,并且可以在一个步骤中进行分析计算。然后构造一个去偏的跨CLIME估计器,并显示为元素渐近正态。此事实用于构造具有错误发现率控制的边缘检测的多测试程序。所提出的估计和多个测试程序在模拟中证明了卓越的数值性能,并通过利用来自多个其他脑组织的基因表达来推断目标脑组织中的基因网络。观察到预测误差的显著减少和链路检测的功率的显著增加。
    Transfer learning for high-dimensional Gaussian graphical models (GGMs) is studied. The target GGM is estimated by incorporating the data from similar and related auxiliary studies, where the similarity between the target graph and each auxiliary graph is characterized by the sparsity of a divergence matrix. An estimation algorithm, Trans-CLIME, is proposed and shown to attain a faster convergence rate than the minimax rate in the single-task setting. Furthermore, we introduce a universal debiasing method that can be coupled with a range of initial graph estimators and can be analytically computed in one step. A debiased Trans-CLIME estimator is then constructed and is shown to be element-wise asymptotically normal. This fact is used to construct a multiple testing procedure for edge detection with false discovery rate control. The proposed estimation and multiple testing procedures demonstrate superior numerical performance in simulations and are applied to infer the gene networks in a target brain tissue by leveraging the gene expressions from multiple other brain tissues. A significant decrease in prediction errors and a significant increase in power for link detection are observed.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    随着肿瘤治疗的最新进展,限制平均生存时间(RMST)越来越多地被用于取代在至事件发生时间结局的随机对照试验中基于风险比的常规方法.虽然RMST已广泛应用于单臂和双臂设计,在将多组试验中的RMST与三组或更多组进行比较方面仍然存在挑战.特别是,文献中不清楚如何同时比较多个干预或基于RMST进行多个测试,样本量的确定是其渗透实践的主要障碍。在本文中,我们提出了一种基于RMST设计具有右删失生存终点的多臂临床试验的新方法,该方法可使用全局χ2检验以及基于建模的多重比较程序应用于II/III期.该框架提供了基于多臂全局测试的封闭形式的样本量公式和基于II期剂量发现研究中的多重比较的样本量确定程序。所提出的方法具有很强的鲁棒性和灵活性,因为它需要较少的先验设置比传统的工作,并在实现目标功率的同时获得较小的样本量。在样本量的评估中,我们还纳入了实际考虑,包括存在非比例危险和交错的患者进入。我们通过各种情景下的仿真研究来评估我们方法的有效性。最后,我们通过在两个真实的临床试验实例的设计中实施该方法,证明了该方法的准确性和稳定性。
    With the recent advances in oncology treatment, restricted mean survival time (RMST) is increasingly being used to replace the routine approach based on hazard ratios in randomized controlled trials for time-to-event outcomes. While RMST has been widely applied in single-arm and two-arm designs, challenges still exist in comparing RMST in multi-arm trials with three or more groups. In particular, it is unclear in the literature how to compare more than one intervention simultaneously or perform multiple testing based on RMST, and sample size determination is a major obstacle to its penetration to practice. In this paper, we propose a novel method of designing multi-arm clinical trials with right-censored survival endpoint based on RMST that can be applied in both phase II/III settings using a global χ2 test as well as a modeling-based multiple comparison procedure. The framework provides a closed-form sample size formula built upon a multi-arm global test and a sample size determination procedure based on multiple-comparison in the phase II dose-finding study. The proposed method enjoys strong robustness and flexibility as it requires less a priori set-up than conventional work, and obtains a smaller sample size while achieving the target power. In the assessment of sample size, we also incorporate practical considerations, including the presence of non-proportional hazards and staggered patient entry. We evaluate the validity of our method through simulation studies under various scenarios. Finally, we demonstrate the accuracy and stability of our method by implementing it in the design of two real clinical trial examples.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    受检测到故障后样本可能非常有限的高维数据流中故障根本原因识别的应用的激励,我们考虑多变量统计过程控制(SPC)模型中的多重测试。具有快速故障检测功能,可以假设只有一小部分数据流失控(OC)。在控制错误发现的数量的同时识别这些OC数据流是一个长期存在的问题。由于在检测到故障的过程终止之后OC样本的数量有限,所以这是具有挑战性的。尽管已经提出了几种错误发现率(FDR)控制方法,人们可能更喜欢其他方法来快速检测。最近开发的一种叫做“敲除过滤”的方法,我们提出了一种可以与其他故障检测方法相结合的仿冒过程,即仿冒过程不会改变停止时间,但可以识别另一组故障来控制FDR。提供了所提出过程的FDR控制定理。仿真研究表明,所提出的过程可以控制FDR,同时保持高功率。我们还说明了推动这一发展的半导体制造工艺应用中的性能。
    Motivated by applications to root-cause identification of faults in high-dimensional data streams that may have very limited samples after faults are detected, we consider multiple testing in models for multivariate statistical process control (SPC). With quick fault detection, only small portion of data streams being out-of-control (OC) can be assumed. It is a long standing problem to identify those OC data streams while controlling the number of false discoveries. It is challenging due to the limited number of OC samples after the termination of the process when faults are detected. Although several false discovery rate (FDR) controlling methods have been proposed, people may prefer other methods for quick detection. With a recently developed method called Knockoff filtering, we propose a knockoff procedure that can combine with other fault detection methods in the sense that the knockoff procedure does not change the stopping time, but may identify another set of faults to control FDR. A theorem for the FDR control of the proposed procedure is provided. Simulation studies show that the proposed procedure can control FDR while maintaining high power. We also illustrate the performance in an application to semiconductor manufacturing processes that motivated this development.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    当在临床试验中考虑多个候选亚组时,我们经常需要同时对亚组进行统计推断。经典的多重测试程序可能无法对子组进行可解释且有效的推断,因为它们通常无法考虑子组大小和子组效应关系。在本文中,建立在选择性遍历积累规则(STAR)上,我们针对子组提出了一种数据自适应和交互式多重测试程序,该程序可以在预先指定的树结构下考虑子组大小和子组效应关系。所提出的方法易于实现,并且可以导致对预先指定的树结构子组的更可解释和有效的推断。本文还讨论了对事后识别的树结构子组的可能适应。我们通过用提出的方法重新分析帕尼单抗试验来证明我们提出的方法的优点。
    When multiple candidate subgroups are considered in clinical trials, we often need to make statistical inference on the subgroups simultaneously. Classical multiple testing procedures might not lead to an interpretable and efficient inference on the subgroups as they often fail to take subgroup size and subgroup effect relationship into account. In this paper, built on the selective traversed accumulation rules (STAR), we propose a data-adaptive and interactive multiple testing procedure for subgroups which can take subgroup size and subgroup effect relationship into account under prespecified tree structure. The proposed method is easy-to-implement and can lead to a more interpretable and efficient inference on prespecified tree-structured subgroups. Possible accommodations to post hoc identified tree-structure subgroups are also discussed in the paper. We demonstrate the merit of our proposed method by re-analyzing the panitumumab trial with the proposed method.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    我们考虑测试多个零假设的问题,必须为每个人做出拒绝或保留的决定,并且将不正确的决定嵌入现实生活中可能会造成不同的损失。我们认为,在这种情况下,控制I型错误率的传统方法可能过于严格,并且标准的家庭错误率可能不合适。使用决策理论方法,我们为给定的决策规则定义合适的损失函数,其中错误的决策可以通过分配不同的损失值来不平等地对待。对数据的抽样分布采取期望使我们能够控制按家庭预期的损失,而不是传统的按家庭的错误率。可以采用不同的损失函数,并且我们在广泛的决策规则类别中搜索满足某些最优性标准的决策规则,在任何参数配置下,预期损失都受到固定阈值的限制。我们说明了在非重叠患者亚组中建立新药物治疗功效的方法。
    We consider the problem of testing multiple null hypotheses, where a decision to reject or retain must be made for each one and embedding incorrect decisions into a real-life context may inflict different losses. We argue that traditional methods controlling the Type I error rate may be too restrictive in this situation and that the standard familywise error rate may not be appropriate. Using a decision-theoretic approach, we define suitable loss functions for a given decision rule, where incorrect decisions can be treated unequally by assigning different loss values. Taking expectation with respect to the sampling distribution of the data allows us to control the familywise expected loss instead of the conventional familywise error rate. Different loss functions can be adopted, and we search for decision rules that satisfy certain optimality criteria within a broad class of decision rules for which the expected loss is bounded by a fixed threshold under any parameter configuration. We illustrate the methods with the problem of establishing efficacy of a new medicinal treatment in non-overlapping subgroups of patients.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • DOI:
    文章类型: Journal Article
    在高维回归模型中识别信息预测因子是关联分析和预测建模的关键步骤。在高维设置中的信号检测通常由于有限的样本大小而失败。提高能力的一种方法是通过荟萃分析解决同一科学问题的多项研究。然而,在研究之间存在异质性的情况下,对来自多个研究的高维数据进行综合分析具有挑战性。在额外的数据共享约束下,这个挑战更加明显,在这些约束下,只能在不同的站点之间共享摘要数据。在本文中,我们提出了一种新颖的数据屏蔽综合大规模测试(DSILT)信号检测方法,允许研究之间的异质性,而不需要共享个人水平的数据。假设数据的潜在高维回归模型在不同的研究中不同,但具有相似的支持,所提出的方法结合了适当的综合估计和去偏性程序,以构建特定协变量的总体影响的检验统计量。我们还开发了一种多重测试程序,以识别显着影响,同时控制错误发现率(FDR)和错误发现比例(FDP)。研究了新测试程序与理想的个体水平荟萃分析(ILMA)方法和其他分布式推理方法的理论比较。仿真研究表明,所提出的测试程序在控制错误发现和获得功率方面均表现良好。新方法应用于检测他汀类药物和肥胖症的遗传变异体对II型糖尿病风险的相互作用的真实示例。
    Identifying informative predictors in a high dimensional regression model is a critical step for association analysis and predictive modeling. Signal detection in the high dimensional setting often fails due to the limited sample size. One approach to improving power is through meta-analyzing multiple studies which address the same scientific question. However, integrative analysis of high dimensional data from multiple studies is challenging in the presence of between-study heterogeneity. The challenge is even more pronounced with additional data sharing constraints under which only summary data can be shared across different sites. In this paper, we propose a novel data shielding integrative large-scale testing (DSILT) approach to signal detection allowing between-study heterogeneity and not requiring the sharing of individual level data. Assuming the underlying high dimensional regression models of the data differ across studies yet share similar support, the proposed method incorporates proper integrative estimation and debiasing procedures to construct test statistics for the overall effects of specific covariates. We also develop a multiple testing procedure to identify significant effects while controlling the false discovery rate (FDR) and false discovery proportion (FDP). Theoretical comparisons of the new testing procedure with the ideal individual-level meta-analysis (ILMA) approach and other distributed inference methods are investigated. Simulation studies demonstrate that the proposed testing procedure performs well in both controlling false discovery and attaining power. The new method is applied to a real example detecting interaction effects of the genetic variants for statins and obesity on the risk for type II diabetes.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    对于大规模假设检验,如全表观基因组关联检验,自适应地将力量集中在更有希望的假设上可以导致更强大的多重测试程序。在这一章中,我们引入了一个多重测试程序,该程序根据类内相关系数(ICC)对每个假设进行加权,CpG甲基化测量的“噪声”度量,增加全表观基因组关联测试的能力。与传统的在过滤的CpG集上的多重测试程序相比,拟议的程序避免了确定最佳ICC截止值的困难,并且总体上更强大。我们说明了该过程,并使用示例数据将其与经典的多个测试过程进行了比较。
    For large-scale hypothesis testing such as epigenome-wide association testing, adaptively focusing power on the more promising hypotheses can lead to a much more powerful multiple testing procedure. In this chapter, we introduce a multiple testing procedure that weights each hypothesis based on the intraclass correlation coefficient (ICC), a measure of \"noisiness\" of CpG methylation measurement, to increase the power of epigenome-wide association testing. Compared to the traditional multiple testing procedure on a filtered CpG set, the proposed procedure circumvents the difficulty to determine the optimal ICC cutoff value and is overall more powerful. We illustrate the procedure and compare the power to classical multiple testing procedures using an example data.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    在多假设检验中,传统的错误发现率(FDR)控制方法通常基于检验统计量的零分布。然而,所有类型的空分布,包括理论,基于置换和经验的,有一些固有的缺点。例如,理论上的零可能会因为对样本分布的不当假设而失败。这里,我们提出了一种无零分布的FDR控制方法,用于病例对照研究中的多重假设检验。这种方法,命名为目标诱饵程序,简单地建立在通过一些统计或分数对测试进行排序的基础上,其零分布不需要知道。竞争性诱饵测试是根据原始样本的排列构建的,用于估计错误的目标发现。我们证明,当得分函数对称并且得分在不同测试之间独立时,这种方法可以控制FDR。模拟表明,它比两种流行的传统方法更稳定、更强大,即使存在依赖。还对两个真实数据集进行了评估,包括拟南芥基因组学数据集和COVID-19蛋白质组学数据集。
    The traditional approaches to false discovery rate (FDR) control in multiple hypothesis testing are usually based on the null distribution of a test statistic. However, all types of null distributions, including the theoretical, permutation-based and empirical ones, have some inherent drawbacks. For example, the theoretical null might fail because of improper assumptions on the sample distribution. Here, we propose a null distribution-free approach to FDR control for multiple hypothesis testing in the case-control study. This approach, named target-decoy procedure, simply builds on the ordering of tests by some statistic or score, the null distribution of which is not required to be known. Competitive decoy tests are constructed from permutations of original samples and are used to estimate the false target discoveries. We prove that this approach controls the FDR when the score function is symmetric and the scores are independent between different tests. Simulation demonstrates that it is more stable and powerful than two popular traditional approaches, even in the existence of dependency. Evaluation is also made on two real datasets, including an arabidopsis genomics dataset and a COVID-19 proteomics dataset.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    差异丰度分析是微生物组数据统计分析的核心。微生物组测序数据的组成性质使得假阳性对照具有挑战性。这里,我们证明了成分效应可以通过简单的方法来解决,但高度灵活和可扩展,方法。所提出的方法,LinDA,只需要在居中的对数比转换数据上拟合线性回归模型,并纠正由于成分效应造成的偏差。我们证明LinDA享有渐近FDR控制,并且可以扩展到相关微生物组数据的混合效应模型。使用模拟和真实的例子,我们证明了LinDA的有效性。
    Differential abundance analysis is at the core of statistical analysis of microbiome data. The compositional nature of microbiome sequencing data makes false positive control challenging. Here, we show that the compositional effects can be addressed by a simple, yet highly flexible and scalable, approach. The proposed method, LinDA, only requires fitting linear regression models on the centered log-ratio transformed data, and correcting the bias due to compositional effects. We show that LinDA enjoys asymptotic FDR control and can be extended to mixed-effect models for correlated microbiome data. Using simulations and real examples, we demonstrate the effectiveness of LinDA.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

公众号