randomization test

随机化试验
  • 文章类型: Journal Article
    可以按照不同的方法分析单案例实验设计(SCED)数据。历史上最早提出的选择之一是随机化测试,受益于在设计中包含随机化:一个理想的方法特征。随着计算资源的可用性,随机化测试变得更加可行,并且已经针对所有主要类型的SCED提出了这样的测试:多基线,逆转/撤回,交替治疗,和改变标准设计。当前文本的重点是最后一个,考虑到它们不是以前任何模拟研究的主题。具体来说,我们估计适用于改变准则设计的两种不同随机化程序的I型错误率和统计能力:相变时刻随机化和阻塞交替准则随机化.我们包括不同的系列长度,相数,自相关水平,和随机可变性。结果表明,I型错误率通常得到控制,并且可以通过少至28-30个独立数据的测量来实现足够的功率。尽管在正自相关的情况下需要更多的测量。对先前标准水平的反转的存在是有益的。提供R代码用于在两个随机化程序之后执行随机化测试。
    Single-case experimental design (SCED) data can be analyzed following different approaches. One of the first historically proposed options is randomizations tests, benefiting from the inclusion of randomization in the design: a desirable methodological feature. Randomization tests have become more feasible with the availability of computational resources, and such tests have been proposed for all major types of SCEDs: multiple-baseline, reversal/withdrawal, alternating treatments, and changing criterion designs. The focus of the current text is on the last of these, given that they have not been the subject of any previous simulation study. Specifically, we estimate type I error rates and statistical power for two different randomization procedures applicable to changing criterion designs: the phase change moment randomization and the blocked alternating criterion randomization. We include different series lengths, number of phases, levels of autocorrelation, and random variability. The results suggest that type I error rates are generally controlled and that sufficient power can be achieved with as few as 28-30 measurements for independent data, although more measurements are needed in case of positive autocorrelation. The presence of a reversal to a previous criterion level is beneficial. R code is provided for carrying out randomization tests following the two randomization procedures.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    本文调查了使用非参数置换测试来分析实验数据。置换方法,这涉及随机化或置换观察数据的特征,是在常见实验设置中得出统计推断的灵活方法。当很少有独立的观察可用时,这是特别有价值的,在经济学和其他社会科学的对照实验中经常发生。置换方法构成了统计推断的综合方法。在两次治疗测试中,排列概念是流行的基于等级的测试的基础,比如Wilcoxon和Mann-Whitney的测试.但是置换推理并不限于序数上下文。可以根据测量的观察结果的排列来构建类似的测试-与秩变换的观察结果相反-我们认为这些测试通常应该是首选。置换测试也可以与多种治疗一起使用,有有序的假设效应,复杂的数据结构,例如在存在干扰变量的情况下进行假设检验。借鉴实验经济学文献中的例子,我们说明了置换测试如何解决常见的挑战。我们的目标是帮助实验者超越当今过度使用的测试,而是将置换测试视为统计推断的灵活框架。
    在线版本包含补充材料,可在10.1007/s10683-023-09799-6获得。
    This article surveys the use of nonparametric permutation tests for analyzing experimental data. The permutation approach, which involves randomizing or permuting features of the observed data, is a flexible way to draw statistical inferences in common experimental settings. It is particularly valuable when few independent observations are available, a frequent occurrence in controlled experiments in economics and other social sciences. The permutation method constitutes a comprehensive approach to statistical inference. In two-treatment testing, permutation concepts underlie popular rank-based tests, like the Wilcoxon and Mann-Whitney tests. But permutation reasoning is not limited to ordinal contexts. Analogous tests can be constructed from the permutation of measured observations-as opposed to rank-transformed observations-and we argue that these tests should often be preferred. Permutation tests can also be used with multiple treatments, with ordered hypothesized effects, and with complex data-structures, such as hypothesis testing in the presence of nuisance variables. Drawing examples from the experimental economics literature, we illustrate how permutation testing solves common challenges. Our aim is to help experimenters move beyond the handful of overused tests in play today and to instead see permutation testing as a flexible framework for statistical inference.
    UNASSIGNED: The online version contains supplementary material available at 10.1007/s10683-023-09799-6.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    我们提出了一种通过反转一系列随机化测试(RT)来构造参数向量的同时置信区间的方法。通过有效的多变量Robbins-Monro程序来促进随机化测试,该程序考虑了所有组件的相关性信息。估计方法除了存在第二矩之外,不需要对总体进行任何分布假设。产生的同时置信区间不一定关于参数向量的点估计对称,而是在所有维度上具有相等尾部的特性。特别是,我们给出了一个种群的均值向量的构造和两个种群的两个均值向量之间的差。进行了广泛的模拟,以显示与四种方法的数值比较。我们说明了所提出的方法在某些实际数据上使用多个端点测试生物等效性的应用。
    We propose a method to construct simultaneous confidence intervals for a parameter vector from inverting a series of randomization tests (RT). The randomization tests are facilitated by an efficient multivariate Robbins-Monro procedure that takes the correlation information of all components into account. The estimation method does not require any distributional assumption of the population other than the existence of the second moments. The resulting simultaneous confidence intervals are not necessarily symmetric about the point estimate of the parameter vector but possess the property of equal tails in all dimensions. In particular, we present the constructing the mean vector of one population and the difference between two mean vectors of two populations. Extensive simulation is conducted to show numerical comparison with four methods. We illustrate the application of the proposed method to test bioequivalence with multiple endpoints on some real data.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    使用非线性混合效应模型(NLMEM)对纵向数据进行分析通常与高功率相关,但有时以膨胀的I型错误为代价。最近公布了克服这个问题的方法,例如跨药物模型的模型平均(MAD),个体模型平均(IMA),和组合似然比测试(cLRT)。这项工作旨在在同一框架中评估七种NLMEM方法:使用真实的自然历史数据在平衡的双臂设计中评估治疗效果,并添加或不添加模拟治疗效果。方法很糟糕,IMA,cLRT,标准型号选择(STD),结构相似性选择(SS),随机cLRT(rcLRT),以及安慰剂和药物模型(MAPD)之间的模型平均。评估包括I类错误,使用817名未经治疗的患者的阿尔茨海默病评估量表-认知(ADAS-cog)评分,以及添加模拟治疗效果后的治疗效果评估的功效和准确性。一组预选候选模型中的模型选择和平均由Akaike信息标准(AIC)驱动。仅IMA和rcLRT控制了I型错误率;否则观察到的通货膨胀可以通过安慰剂模型的错误规范和选择偏差来解释。IMA和rcLRT均具有合理的功率和准确性,但典型治疗效果较低。
    Analyses of longitudinal data with non-linear mixed-effects models (NLMEM) are typically associated with high power, but sometimes at the cost of inflated type I error. Approaches to overcome this problem were published recently, such as model-averaging across drug models (MAD), individual model-averaging (IMA), and combined Likelihood Ratio Test (cLRT). This work aimed to assess seven NLMEM approaches in the same framework: treatment effect assessment in balanced two-armed designs using real natural history data with or without the addition of simulated treatment effect. The approaches are MAD, IMA, cLRT, standard model selection (STDs), structural similarity selection (SSs), randomized cLRT (rcLRT), and model-averaging across placebo and drug models (MAPD). The assessment included type I error, using Alzheimer\'s Disease Assessment Scale-cognitive (ADAS-cog) scores from 817 untreated patients and power and accuracy in the treatment effect estimates after the addition of simulated treatment effects. The model selection and averaging among a set of pre-selected candidate models were driven by the Akaike information criteria (AIC). The type I error rate was controlled only for IMA and rcLRT; the inflation observed otherwise was explained by the placebo model misspecification and selection bias. Both IMA and rcLRT had reasonable power and accuracy except under a low typical treatment effect.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:相对生长速率(RGR)在生物学中的使用历史悠久。在其记录的形式中,RGR=ln[(M+ΔM)/M],其中M是研究开始时生物体的大小,ΔM是时间间隔Δt上的新增长。它说明了比较非独立(混淆)变量的一般问题,例如(X+Y)与X.因此,RGR取决于甚至在相同生长阶段内使用的起始M(X)。同样,RGR缺乏与其派生组件的独立性,净同化率(NAR)和叶片质量比(LMR),作为RGR=NAR×LMR,因此,它们不能通过标准回归或相关分析进行合法比较。
    结果:X或Y的方差很大,或者正在比较的数据集之间的X和Y值几乎没有范围重叠。关系(方向,此类混淆变量之间的曲线性)基本上是预先确定的,因此不应将其报告为研究发现。用M而不是时间来标准化并不能解决问题。我们提出了固有增长率(IGR),lnΔM/lnM,作为一个简单的,在同一生长阶段独立于M的RGR的稳健替代方案。
    结论:尽管首选的选择是完全避免这种做法,我们讨论了将表达式与通用组件进行比较可能仍然有用的情况。如果(1)配对之间的回归斜率产生新的生物学兴趣变量,这些可以提供见解,(2)使用合适的方法支持该关系的统计显著性,比如我们专门设计的随机化测试,或(3)多个数据集进行比较,发现有统计学差异。区分真实的生物关系和虚假的关系,它们来自比较非独立表达式,在处理与植物生长分析相关的派生变量时是必不可少的。
    Relative growth rate (RGR) has a long history of use in biology. In its logged form, RGR = ln[(M + ΔM)/M], where M is size of the organism at the commencement of the study, and ΔM is new growth over time interval Δt. It illustrates the general problem of comparing non-independent (confounded) variables, e.g. (X + Y) vs. X. Thus, RGR depends on what starting M(X) is used even within the same growth phase. Equally, RGR lacks independence from its derived components, net assimilation rate (NAR) and leaf mass ratio (LMR), as RGR = NAR × LMR, so that they cannot legitimately be compared by standard regression or correlation analysis.
    The mathematical properties of RGR exemplify the general problem of \'spurious\' correlations that compare expressions derived from various combinations of the same component terms X and Y. This is particularly acute when X >> Y, the variance of X or Y is large, or there is little range overlap of X and Y values among datasets being compared. Relationships (direction, curvilinearity) between such confounded variables are essentially predetermined and so should not be reported as if they are a finding of the study. Standardizing by M rather than time does not solve the problem. We propose the inherent growth rate (IGR), lnΔM/lnM, as a simple, robust alternative to RGR that is independent of M within the same growth phase.
    Although the preferred alternative is to avoid the practice altogether, we discuss cases where comparing expressions with components in common may still have utility. These may provide insights if (1) the regression slope between pairs yields a new variable of biological interest, (2) the statistical significance of the relationship remains supported using suitable methods, such as our specially devised randomization test, or (3) multiple datasets are compared and found to be statistically different. Distinguishing true biological relationships from spurious ones, which arise from comparing non-independent expressions, is essential when dealing with derived variables associated with plant growth analyses.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    Researchers conducting small-scale cluster randomized controlled trials (RCTs) during the pilot testing of an intervention often look for evidence of promise to justify an efficacy trial. We developed a method to test for intervention effects that is adaptive (i.e., responsive to data exploration), requires few assumptions, and is statistically valid (i.e., controls the type I error rate), by adapting masked visual analysis techniques to cluster RCTs. We illustrate the creation of masked graphs and their analysis using data from a pilot study in which 15 high school programs were randomly assigned to either business as usual or an intervention developed to promote psychological and academic well-being in 9th grade students in accelerated coursework. We conclude that in small-scale cluster RCTs there can be benefits of testing for effects without a priori specification of a statistical model or test statistic.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    The dysbiosis of the gut microbiome associated with ulcerative colitis (UC) has been extensively studied in recent years. However, the question of whether UC influences the spatial heterogeneity of the human gut mucosal microbiome has not been addressed. Spatial heterogeneity (specifically, the inter-individual heterogeneity in microbial species abundances) is one of the most important characterizations at both population and community scales, and can be assessed and interpreted by Taylor\'s power law (TPL) and its community-scale extensions (TPLEs). Due to the high mobility of microbes, it is difficult to investigate their spatial heterogeneity explicitly; however, TPLE offers an effective approach to implicitly analyze the microbial communities. Here, we investigated the influence of UC on the spatial heterogeneity of the gut microbiome with intestinal mucosal microbiome samples collected from 28 UC patients and healthy controls. Specifically, we applied Type-I TPLE for measuring community spatial heterogeneity and Type-III TPLE for measuring mixed-species population heterogeneity to evaluate the heterogeneity changes of the mucosal microbiome induced by UC at both the community and species scales. We further used permutation test to determine the possible differences between UC patients and healthy controls in heterogeneity scaling parameters. Results showed that UC did not significantly influence gut mucosal microbiome heterogeneity at either the community or mixed-species levels. These findings demonstrated significant resilience of the human gut microbiome and confirmed a prediction of TPLE: that the inter-subject heterogeneity scaling parameter of the gut microbiome is an intrinsic property to humans, invariant with UC disease.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    Permutation tests are useful in stepped-wedge trials to provide robust statistical tests of intervention-effect estimates. However, the Stata command permute does not produce valid tests in this setting because individual observations are not exchangeable. We introduce the swpermute command that permutes clusters to sequences to maintain exchangeability. The command provides additional functionality to aid users in performing analyses of stepped-wedge trials. In particular, we include the option \"withinperiod\" that performs the specified analysis separately in each period of the study with the resulting period-specific intervention-effect estimates combined as a weighted average. We also include functionality to test non-zero null hypotheses to aid the construction of confidence intervals. Examples of the application of swpermute are given using data from a trial testing the impact of a new tuberculosis diagnostic test on bacterial confirmation of a tuberculosis diagnosis.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    Between-group comparison based on the restricted mean survival time (RMST) is getting attention as an alternative to the conventional logrank/hazard ratio approach for time-to-event outcomes in randomized controlled trials (RCTs). The validity of the commonly used nonparametric inference procedure for RMST has been well supported by large sample theories. However, we sometimes encounter cases with a small sample size in practice, where we cannot rely on the large sample properties. Generally, the permutation approach can be useful to handle these situations in RCTs. However, a numerical issue arises when implementing permutation tests for difference or ratio of RMST from two groups. In this article, we discuss the numerical issue and consider six permutation methods for comparing survival time distributions between two groups using RMST in RCTs setting. We conducted extensive numerical studies and assessed type I error rates of these methods. Our numerical studies demonstrated that the inflation of the type I error rate of the asymptotic methods is not negligible when sample size is small, and that all of the six permutation methods are workable solutions. Although some permutation methods became a little conservative, no remarkable inflation of the type I error rates were observed. We recommend using permutation tests instead of the asymptotic tests, especially when the sample size is less than 50 per arm.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

  • 文章类型: Journal Article
    Two common barriers to applying statistical tests to single-case experiments are that single-case data often violate the assumptions of parametric tests and that random assignment is inconsistent with the logic of single-case design. However, in the case of randomization tests applied to single-case experiments with rapidly alternating conditions, neither the statistical assumptions nor the logic of the designs are violated. To examine the utility of randomization tests for single-case data, we collected a sample of published articles including alternating treatments or multielement designs with random or semi-random condition sequences. We extracted data from graphs and used randomization tests to estimate the probability of obtaining results at least as extreme as the results in the experiment by chance alone (i.e., p-value). We compared the distribution of p-values from experimental comparisons that did and did not indicate a functional relation based on visual analysis and evaluated agreement between visual and statistical analysis at several levels of α. Results showed different means, shapes, and spreads for the p-value distributions and substantial agreement between visual and statistical analysis when α = .05, with lower agreement when α was adjusted to preserve family-wise error at .05. Questions remain, however, on the appropriate application and interpretation of randomization tests for single-case designs.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号