METHODS: Upstrapping is motivated by the case resampling bootstrap and involves repeatedly sampling with replacement from the interim data to simulate thousands of fully enrolled trials. The p-value is calculated for each upstrapped trial and the proportion of upstrapped trials for which the p-value criteria are met is compared with a pre-specified decision threshold. To evaluate the potential utility for upstrapping as a form of interim futility monitoring, we conducted a simulation study considering different sample sizes with several different proposed calibration strategies for the upstrap. We first compared trial rejection rates across a selection of threshold combinations to validate the upstrapping method. Then, we applied upstrapping methods to simulated clinical trial data, directly comparing their performance with more traditional alpha-spending and conditional power interim monitoring methods for futility.
RESULTS: The method validation demonstrated that upstrapping is much more likely to find evidence of futility in the null scenario than the alternative across a variety of simulations settings. Our three proposed approaches for calibration of the upstrap had different strengths depending on the stopping rules used. Compared to O\'Brien-Fleming group sequential methods, upstrapped approaches had type I error rates that differed by at most 1.7% and expected sample size was 2-22% lower in the null scenario, while in the alternative scenario power fluctuated between 15.7% lower and 0.2% higher and expected sample size was 0-15% lower.
CONCLUSIONS: In this proof-of-concept simulation study, we evaluated the potential for upstrapping as a resampling-based method for futility monitoring in clinical trials. The trade-offs in expected sample size, power, and type I error rate control indicate that the upstrap can be calibrated to implement futility monitoring with varying degrees of aggressiveness and that performance similarities can be identified relative to considered alpha-spending and conditional power futility monitoring methods.
方法:Upstrapping的动机是病例重采样自举,并且涉及重复采样并从临时数据中替换,以模拟数千个完全注册的试验。计算每个上行试验的p值,并将满足p值标准的上行试验的比例与预先指定的决策阈值进行比较。为了评估作为一种临时徒劳监测的潜在效用,我们进行了一项模拟研究,考虑了不同的样本量和几种不同的建议校准策略。我们首先比较了一系列阈值组合的试验拒绝率,以验证上绑方法。然后,我们将上绑方法应用于模拟临床试验数据,直接将他们的表现与更传统的阿尔法支出和有条件的权力临时监测方法进行比较,以防止徒劳。
结果:方法验证表明,与各种模拟设置中的替代方法相比,在空场景中更有可能发现无用的证据。根据使用的停止规则,我们提出的三种向上校准方法具有不同的强度。与O'Brien-Fleming小组序贯方法相比,升级方法的I型错误率最多相差1.7%,在空场景中预期样本量低2-22%,而在替代方案中,功率在15.7%和0.2%之间波动,预期样本量降低0-15%。
结论:在这个概念验证模拟研究中,我们评估了在临床试验中作为基于重采样的无益性监测方法的可能性.预期样本量的权衡,电源,和I型错误率控制表明,可以校准升频以实现具有不同程度的侵略性的徒劳监视,并且可以相对于考虑的alpha支出和条件性功率徒劳监视方法来识别性能相似性。