具有多重比较的诊断测试准确性研究的统计推断。Statistical inference for diagnostic test accuracy studies with multiple comparisons.-医云文献数字医云科研云海量医学决策数据服务

Abstract：

Diagnostic accuracy studies assess the sensitivity and specificity of a new index test in relation to an established comparator or the reference standard. The development and selection of the index test are usually assumed to be conducted prior to the accuracy study. In practice, this is often violated, for instance, if the choice of the (apparently) best biomarker, model or cutpoint is based on the same data that is used later for validation purposes. In this work, we investigate several multiple comparison procedures which provide family-wise error rate control for the emerging multiple testing problem. Due to the nature of the co-primary hypothesis problem, conventional approaches for multiplicity adjustment are too conservative for the specific problem and thus need to be adapted. In an extensive simulation study, five multiple comparison procedures are compared with regard to statistical error rates in least-favourable and realistic scenarios. This covers parametric and non-parametric methods and one Bayesian approach. All methods have been implemented in the new open-source R package cases which allows us to reproduce all simulation results. Based on our numerical results, we conclude that the parametric approaches (maxT and Bonferroni) are easy to apply but can have inflated type I error rates for small sample sizes. The two investigated Bootstrap procedures, in particular the so-called pairs Bootstrap, allow for a family-wise error rate control in finite samples and in addition have a competitive statistical power.

摘要：

诊断准确性研究评估新指标测试相对于已建立的比较器或参考标准的敏感性和特异性。通常假定在准确性研究之前进行指标测试的开发和选择。在实践中,这经常被违反，例如,如果选择（显然）最好的生物标志物，模型或切割点基于稍后用于验证目的的相同数据。在这项工作中,我们研究了几种多重比较程序，这些程序为新出现的多重测试问题提供了家庭错误率控制。由于共同假设问题的性质，传统的多重性调整方法对于特定问题过于保守，因此需要进行调整。在广泛的模拟研究中，在最不利和现实的情况下，比较了五种多重比较程序的统计错误率。这涵盖了参数和非参数方法以及一种贝叶斯方法。所有方法都已在新的开源R包案例中实现，这使我们能够重现所有仿真结果。根据我们的数值结果，我们得出的结论是，参数方法（maxT和Bonferroni）很容易应用，但对于小样本量，可能会膨胀I型错误率。这两个人调查了Bootstrap程序，特别是所谓的双引导，允许在有限样本中进行家族错误率控制，此外还具有竞争统计能力。