multiple testing

多重测试
  • 文章类型: Journal Article
    最近已证明封闭测试对于同时进行真实发现比例控制是最佳的。是的,然而,构建真正的发现保证程序具有挑战性,因为它将权力集中在用户根据他们的特定兴趣或专业知识选择的某些特征集上。我们提出了一个程序,允许用户以预定的功能集为目标电源,也就是说,\"焦点集。“尽管如此,该方法还允许推断事后选择的特征集,也就是说,\"非焦点集,\“为此,我们推导了一个由插值限制的真正的发现较低的置信度。我们的程序是由部分真实发现保证程序与Holm\的程序相结合而构建的,是封闭测试程序的保守捷径。仿真研究证实,对于焦点集,我们方法的统计能力相对较高,以非聚焦集的功率为代价,根据需要。此外,我们研究了具有特定结构的集合的功率属性,例如,树和有向无环图。我们还在可复制性分析的背景下将我们的方法与AdaFilter进行了比较。通过基因本体分析在基因表达数据中说明了我们方法的应用。
    Closed testing has recently been shown to be optimal for simultaneous true discovery proportion control. It is, however, challenging to construct true discovery guarantee procedures in such a way that it focuses power on some feature sets chosen by users based on their specific interest or expertise. We propose a procedure that allows users to target power on prespecified feature sets, that is, \"focus sets.\" Still, the method also allows inference for feature sets chosen post hoc, that is, \"nonfocus sets,\" for which we deduce a true discovery lower confidence bound by interpolation. Our procedure is built from partial true discovery guarantee procedures combined with Holm\'s procedure and is a conservative shortcut to the closed testing procedure. A simulation study confirms that the statistical power of our method is relatively high for focus sets, at the cost of power for nonfocus sets, as desired. In addition, we investigate its power property for sets with specific structures, for example, trees and directed acyclic graphs. We also compare our method with AdaFilter in the context of replicability analysis. The application of our method is illustrated with a gene ontology analysis in gene expression data.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    随着实时系统评价的荟萃分析的每次更新,重新计算治疗效果及其置信区间。这通常会提出一个问题,即多重性是否是一个问题,以及是否需要一种针对多重性进行调整的方法。似乎回答这些问题并不是那么简单。我们通过考虑系统评论的背景并指出在荟萃分析中处理多重性的现有方法来解决这一问题。我们得出的结论是,当计划以提供最新证据为目的时,多重性在生活系统评价中不是一个相关问题。对未来研究的决定没有任何直接控制。多重性可能是个问题,虽然,在根据涉及“停止决定”的协议设计的生活系统评价中,这可能是生活指南制定或报销决定的情况。存在几种适当的方法来处理荟萃分析中的多重性。现有方法,然而,也与几个技术和概念限制有关,并可以在未来的方法项目中得到改进。为了更好地决定是否需要对多重性进行调整,生活系统评价的作者和使用者应了解工作的背景,并质疑生活系统评价的效果估计与其停止/更新之间是否存在依赖性或对未来研究的影响。
    With each update of meta-analyses from living systematic reviews, treatment effects and their confidence intervals are recalculated. This often raises the question whether or not multiplicity is an issue and whether a method to adjust for multiplicity is needed. It seems that answering these questions is not that straightforward. We approach this matter by considering the context of systematic reviews and pointing out existing methods for handling multiplicity in meta-analysis. We conclude that multiplicity is not a relevant issue in living systematic reviews when they are planned with the aim to provide up-to-date evidence, without any direct control on the decision over future research. Multiplicity might be an issue, though, in living systematic reviews designed under a protocol involving a \"stopping decision\", which can be the case in living guideline development or in reimbursement decisions. Several appropriate methods exist for handling multiplicity in meta-analysis. Existing methods, however, are also associated with several technical and conceptual limitations, and could be improved in future methodological projects. To better decide whether an adjustment for multiplicity is necessary at all, authors and users of living systematic reviews should be aware of the context of the work and question whether there is a dependency between the effect estimates of the living systematic review and its stopping/updating or an influence on future research.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    在免疫学研究中,流式细胞术是一种常用的多变量单细胞检测方法。流式细胞术分析的一个关键目标是检测对某些刺激有反应的免疫细胞。统计上,这个问题可以转化为比较刺激前后的两个蛋白质表达概率密度函数(pdfs);目标是确定这两个pdfs不同的区域。可以进行这些差异区域的进一步筛选以鉴定富集的响应细胞组。在本文中,我们将识别差异密度区域建模为多重测试问题。首先,我们将样本空间分成小的箱子。在每个垃圾箱中,我们形成了一个假设来检验微分pdfs的存在。第二,我们开发了一种新颖的多重测试方法,称为TEAM(聚合树方法上的测试),在将错误发现率(FDR)控制在所需水平下的同时,识别那些含有差异PDF的垃圾箱。TEAM将测试程序嵌入到聚合树中,以从精细分辨率到粗略分辨率进行测试。该过程实现了将密度差异精确定位到最小可能区域的统计目标。团队的计算效率很高,与竞争方法相比,能够在更短的时间内分析大型流式细胞术数据集。我们将TEAM和竞争方法应用于流式细胞术数据集以鉴定响应巨细胞病毒(CMV)-pp65抗原刺激的T细胞。通过额外的下游筛选,团队成功地确定了含有单官能的富集集,双功能,和多功能T细胞。竞争方法要么没有在合理的时间范围内完成,要么提供的结果解释性较差。数值模拟和理论证明,TEAM具有渐近有效性,强大,和强大的性能。总的来说,TEAM是一种计算高效且统计强大的算法,可以在流式细胞术研究中产生有意义的生物学见解。
    In immunology studies, flow cytometry is a commonly used multivariate single-cell assay. One key goal in flow cytometry analysis is to detect the immune cells responsive to certain stimuli. Statistically, this problem can be translated into comparing two protein expression probability density functions (pdfs) before and after the stimulus; the goal is to pinpoint the regions where these two pdfs differ. Further screening of these differential regions can be performed to identify enriched sets of responsive cells. In this paper, we model identifying differential density regions as a multiple testing problem. First, we partition the sample space into small bins. In each bin, we form a hypothesis to test the existence of differential pdfs. Second, we develop a novel multiple testing method, called TEAM (Testing on the Aggregation tree Method), to identify those bins that harbor differential pdfs while controlling the false discovery rate (FDR) under the desired level. TEAM embeds the testing procedure into an aggregation tree to test from fine- to coarse-resolution. The procedure achieves the statistical goal of pinpointing density differences to the smallest possible regions. TEAM is computationally efficient, capable of analyzing large flow cytometry data sets in much shorter time compared with competing methods. We applied TEAM and competing methods on a flow cytometry data set to identify T cells responsive to the cytomegalovirus (CMV)-pp65 antigen stimulation. With additional downstream screening, TEAM successfully identified enriched sets containing monofunctional, bifunctional, and polyfunctional T cells. Competing methods either did not finish in a reasonable time frame or provided less interpretable results. Numerical simulations and theoretical justifications demonstrate that TEAM has asymptotically valid, powerful, and robust performance. Overall, TEAM is a computationally efficient and statistically powerful algorithm that can yield meaningful biological insights in flow cytometry studies.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    当假设存在逻辑嵌套结构时,我们考虑多个假设检验的问题。当一个假设嵌套在另一个假设中时,如果内部假设是错误的,则外部假设必须是错误的。我们将嵌套结构建模为有向无环图,包括链图和树图作为特殊情况。图中的每个节点都是一个假设,拒绝一个节点也需要拒绝它的所有祖先。我们提出了一个通用框架,用于使用已知的逻辑约束来调整节点级测试统计信息。在这个框架内,我们研究了一个平滑过程,该过程将每个节点与其所有后代结合起来,以形成一个更强大的统计量。我们证明了一类广泛的平滑策略可以与现有的选择程序一起使用来控制家庭错误率,错误发现超标率,或者错误的发现率,只要原始测试统计信息在null下是独立的。当零统计量不是独立的,而是来自正相关的正态观察时,当平滑方法是对观测值进行算术平均时,我们证明了对所有三个错误率的控制。模拟和对真实生物学数据集的应用表明,平滑会导致大量的功率增益。
    We consider the problem of multiple hypothesis testing when there is a logical nested structure to the hypotheses. When one hypothesis is nested inside another, the outer hypothesis must be false if the inner hypothesis is false. We model the nested structure as a directed acyclic graph, including chain and tree graphs as special cases. Each node in the graph is a hypothesis and rejecting a node requires also rejecting all of its ancestors. We propose a general framework for adjusting node-level test statistics using the known logical constraints. Within this framework, we study a smoothing procedure that combines each node with all of its descendants to form a more powerful statistic. We prove a broad class of smoothing strategies can be used with existing selection procedures to control the familywise error rate, false discovery exceedance rate, or false discovery rate, so long as the original test statistics are independent under the null. When the null statistics are not independent but are derived from positively-correlated normal observations, we prove control for all three error rates when the smoothing method is arithmetic averaging of the observations. Simulations and an application to a real biology dataset demonstrate that smoothing leads to substantial power gains.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    推断无缝2/3设计在临床试验中越来越受欢迎。重要的是要了解它们与单独的第二阶段和第三阶段试验相比的相对优势,并了解设计选择的后果,例如设计第二阶段部分中包含的患者比例。扩展这方面的先前工作,我们对多个臂和疗效反应曲线进行了模拟研究.我们考虑的设计空间跨越了单独设计与无缝设计的选择,选择在第2阶段分配0%-100%的可用患者,其余患者在第3阶段。无缝设计比单独的试验对应物实现更大的功率。重要的是,最优无缝设计比最优单独方案更健壮,这意味着2期使用的患者比例的一个值范围(占总2/3期样本量的30%-50%)对于广泛的应答方案几乎是最佳的.相比之下,在2期单独试验中使用的患者百分比对于某些替代方案可能是最佳的,但对于其他替代方案则明显较差.当操作和科学上可行时,无缝试验提供卓越的性能相比,单独的阶段2和阶段3试验。研究结果也为这些试验在实践中的实施提供了指导。
    Inferentially seamless 2/3 designs are increasingly popular in clinical trials. It is important to understand their relative advantages compared with separate phase 2 and phase 3 trials, and to understand the consequences of design choices such as the proportion of patients included in the phase 2 portion of the design. Extending previous work in this area, we perform a simulation study across multiple numbers of arms and efficacy response curves. We consider a design space crossing the choice of a separate versus seamless design with the choice of allocating 0%-100% of available patients in phase 2, with the remainder in phase 3. The seamless designs achieve greater power than their separate trial counterparts. Importantly, the optimal seamless design is more robust than the optimal separate program, meaning that one range of values for the proportion of patients used in phase 2 (30%-50% of the total phase 2/3 sample size) is nearly optimal for a wide range of response scenarios. In contrast, a percentage of patients used in phase 2 for separate trials may be optimal for some alternative scenarios but decidedly inferior for other alternative scenarios. When operationally and scientifically viable, seamless trials provide superior performance compared with separate phase 2 and phase 3 trials. The results also provide guidance for the implementation of these trials in practice.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    错误发现率(FDR)是用于涉及多个假设检验的基因组数据分析的统计显著性的广泛使用的度量。在计划进行这些类型的基因组数据分析的研究中,功率和样本量的考虑非常重要。这里,我们提出了p值直方图的三矩形近似,以得出一个公式来计算涉及FDR的分析的统计能力和样本大小。我们还介绍了R软件包FDRsamplesize2,该软件包结合了这些和其他功率计算公式,以计算其他FDR功率计算软件未涵盖的各种研究的功率。提供了几个说明性示例。FDRsamplesize2软件包在CRAN上可用。
    The false discovery rate (FDR) is a widely used metric of statistical significance for genomic data analyses that involve multiple hypothesis testing. Power and sample size considerations are important in planning studies that perform these types of genomic data analyses. Here, we propose a three-rectangle approximation of a p-value histogram to derive a formula to compute the statistical power and sample size for analyses that involve the FDR. We also introduce the R package FDRsamplesize2, which incorporates these and other power calculation formulas to compute power for a broad variety of studies not covered by other FDR power calculation software. A few illustrative examples are provided. The FDRsamplesize2 package is available on CRAN.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    多重检验一直是统计研究中的一个突出课题。尽管在这方面做了大量的工作,控制错误发现仍然是一项具有挑战性的任务,特别是当检验统计量表现出依赖性时。已经提出了各种方法来估计在测试统计量之间的任意依赖性下的错误发现比例(FDP)。一种关键方法是将任意依赖转化为弱依赖,并随后建立FDP的强一致性和弱依赖下的错误发现率。因此,FDP在弱依赖框架内收敛到相同的渐近极限。然而,我们已经观察到,FDP的渐近方差可以显著影响的依赖结构的检验统计,即使它们只表现出微弱的依赖性。量化这种可变性具有非常重要的实际意义,因为它可以作为从数据中评估FDP质量的指标。据我们所知,文献中对这方面的研究有限。在本文中,我们的目标是通过量化FDP的变化来填补这一空白,假设检验统计量表现出弱依赖性,服从正态分布。我们首先推导FDP的渐近展开,然后研究FDP的渐近方差如何受到不同依赖结构的影响。基于从这项研究中获得的见解,我们建议在使用FDP的多个测试程序中,报告FDP的均值和方差估计值可以为研究结果提供更全面的评估.
    Multiple testing has been a prominent topic in statistical research. Despite extensive work in this area, controlling false discoveries remains a challenging task, especially when the test statistics exhibit dependence. Various methods have been proposed to estimate the false discovery proportion (FDP) under arbitrary dependencies among the test statistics. One key approach is to transform arbitrary dependence into weak dependence and subsequently establish the strong consistency of FDP and false discovery rate under weak dependence. As a result, FDPs converge to the same asymptotic limit within the framework of weak dependence. However, we have observed that the asymptotic variance of FDP can be significantly influenced by the dependence structure of the test statistics, even when they exhibit only weak dependence. Quantifying this variability is of great practical importance, as it serves as an indicator of the quality of FDP estimation from the data. To the best of our knowledge, there is limited research on this aspect in the literature. In this paper, we aim to fill in this gap by quantifying the variation of FDP, assuming that the test statistics exhibit weak dependence and follow normal distributions. We begin by deriving the asymptotic expansion of the FDP and subsequently investigate how the asymptotic variance of the FDP is influenced by different dependence structures. Based on the insights gained from this study, we recommend that in multiple testing procedures utilizing FDP, reporting both the mean and variance estimates of FDP can provide a more comprehensive assessment of the study\'s outcomes.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    诊断准确性研究评估新指标测试相对于已建立的比较器或参考标准的敏感性和特异性。通常假定在准确性研究之前进行指标测试的开发和选择。在实践中,这经常被违反,例如,如果选择(显然)最好的生物标志物,模型或切割点基于稍后用于验证目的的相同数据。在这项工作中,我们研究了几种多重比较程序,这些程序为新出现的多重测试问题提供了家庭错误率控制。由于共同假设问题的性质,传统的多重性调整方法对于特定问题过于保守,因此需要进行调整。在广泛的模拟研究中,在最不利和现实的情况下,比较了五种多重比较程序的统计错误率。这涵盖了参数和非参数方法以及一种贝叶斯方法。所有方法都已在新的开源R包案例中实现,这使我们能够重现所有仿真结果。根据我们的数值结果,我们得出的结论是,参数方法(maxT和Bonferroni)很容易应用,但对于小样本量,可能会膨胀I型错误率。这两个人调查了Bootstrap程序,特别是所谓的双引导,允许在有限样本中进行家族错误率控制,此外还具有竞争统计能力。
    Diagnostic accuracy studies assess the sensitivity and specificity of a new index test in relation to an established comparator or the reference standard. The development and selection of the index test are usually assumed to be conducted prior to the accuracy study. In practice, this is often violated, for instance, if the choice of the (apparently) best biomarker, model or cutpoint is based on the same data that is used later for validation purposes. In this work, we investigate several multiple comparison procedures which provide family-wise error rate control for the emerging multiple testing problem. Due to the nature of the co-primary hypothesis problem, conventional approaches for multiplicity adjustment are too conservative for the specific problem and thus need to be adapted. In an extensive simulation study, five multiple comparison procedures are compared with regard to statistical error rates in least-favourable and realistic scenarios. This covers parametric and non-parametric methods and one Bayesian approach. All methods have been implemented in the new open-source R package cases which allows us to reproduce all simulation results. Based on our numerical results, we conclude that the parametric approaches (maxT and Bonferroni) are easy to apply but can have inflated type I error rates for small sample sizes. The two investigated Bootstrap procedures, in particular the so-called pairs Bootstrap, allow for a family-wise error rate control in finite samples and in addition have a competitive statistical power.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    生存时间是许多随机对照试验的主要终点,治疗效果通常在比例风险假设下通过风险比进行量化。意识到在许多情况下,这个假设是先验违反的,例如,由于药物作用的延迟发作。在这些情况下,对风险比估计的解释是模糊的,并且有必要对替代参数进行统计推断以量化治疗效果。我们考虑里程碑生存概率或分位数的差异或比率,限制平均生存时间的差异,和平均危险比值得关注。通常,需要报告一个以上的参数以评估可能的治疗益处,在验证性试验中,根据推理程序需要针对多重性进行调整。简单的Bonferroni调整可能过于保守,因为不同的感兴趣参数通常显示出相当大的相关性。因此,需要考虑相关性的同时推理程序。通过使用上述参数的计数过程表示,我们证明了它们的估计是渐近多变量正态的,并给出了它们的协方差矩阵的估计。我们根据参数提出了多个测试程序和同时的置信区间。此外,logrank测试可能包含在框架中。通过仿真研究了有限样本I型错误率和功率。用来自肿瘤学的实例说明所述方法。在R包nph中提供了软件实现。
    Survival time is the primary endpoint of many randomized controlled trials, and a treatment effect is typically quantified by the hazard ratio under the assumption of proportional hazards. Awareness is increasing that in many settings this assumption is a priori violated, for example, due to delayed onset of drug effect. In these cases, interpretation of the hazard ratio estimate is ambiguous and statistical inference for alternative parameters to quantify a treatment effect is warranted. We consider differences or ratios of milestone survival probabilities or quantiles, differences in restricted mean survival times, and an average hazard ratio to be of interest. Typically, more than one such parameter needs to be reported to assess possible treatment benefits, and in confirmatory trials, the according inferential procedures need to be adjusted for multiplicity. A simple Bonferroni adjustment may be too conservative because the different parameters of interest typically show considerable correlation. Hence simultaneous inference procedures that take into account the correlation are warranted. By using the counting process representation of the mentioned parameters, we show that their estimates are asymptotically multivariate normal and we provide an estimate for their covariance matrix. We propose according to the parametric multiple testing procedures and simultaneous confidence intervals. Also, the logrank test may be included in the framework. Finite sample type I error rate and power are studied by simulation. The methods are illustrated with an example from oncology. A software implementation is provided in the R package nph.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    生存分析中的几种方法都是基于比例风险假设。然而,这种假设限制性很强,在实践中往往不合理。因此,在实际应用中,不依赖于比例风险假设的效应估计是非常可取的。一个流行的例子是受限平均生存时间(RMST)。它被定义为存活曲线下的面积,直到一个预先指定的时间点,因此,将存活曲线总结成一个有意义的估计。对于基于RMST的双样本比较,先前的研究发现了小样本渐近检验的I型误差的膨胀,因此,已经开发了双样本置换测试。本文的第一个目标是通过考虑Wald型检验统计量及其渐近行为,进一步扩展一般阶乘设计和一般对比假设的置换检验。此外,考虑了分组引导方法。此外,当全局测试通过比较两组以上的RMST来检测到显着差异时,感兴趣的是具体的RMST差异导致结果。然而,全局测试不提供此信息。因此,在第二步中开发了RMST的多个测试,以同时推断几个空假设。特此,结合了局部检验统计量之间的渐近精确依赖结构,以获得更多的功率。最后,在仿真中分析了所提出的全局和多个测试程序的小样本性能,并在一个真实的数据示例中进行了说明。
    Several methods in survival analysis are based on the proportional hazards assumption. However, this assumption is very restrictive and often not justifiable in practice. Therefore, effect estimands that do not rely on the proportional hazards assumption are highly desirable in practical applications. One popular example for this is the restricted mean survival time (RMST). It is defined as the area under the survival curve up to a prespecified time point and, thus, summarizes the survival curve into a meaningful estimand. For two-sample comparisons based on the RMST, previous research found the inflation of the type I error of the asymptotic test for small samples and, therefore, a two-sample permutation test has already been developed. The first goal of the present paper is to further extend the permutation test for general factorial designs and general contrast hypotheses by considering a Wald-type test statistic and its asymptotic behavior. Additionally, a groupwise bootstrap approach is considered. Moreover, when a global test detects a significant difference by comparing the RMSTs of more than two groups, it is of interest which specific RMST differences cause the result. However, global tests do not provide this information. Therefore, multiple tests for the RMST are developed in a second step to infer several null hypotheses simultaneously. Hereby, the asymptotically exact dependence structure between the local test statistics is incorporated to gain more power. Finally, the small sample performance of the proposed global and multiple testing procedures is analyzed in simulations and illustrated in a real data example.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

公众号