multiple testing

多重测试
  • 文章类型: Journal Article
    在随机对照试验(RCT)中进行多种统计检验的问题上,意见和做法各不相同。我们进行了一项研究,使用方法快速审查和调查来整理有关意见和实践的信息,特别是不寻求营销授权的公共资助的务实RCT。目的是确定研究人员对多重性进行统计调整的情况。
    从七种高质量医学期刊之一的报告语用随机对照试验的主要分析的文章中提取信息进行了综述,2018年1月至6月(含)。一项关于多重性的意见和实践的调查(调查猴子)被分发到英国的47个注册临床试验单位(CTU)。
    审查中包括了一百三十八个RCT,和调查答复来自27/47(57%)CTU。审查和调查都表明,多重性调整被认为对多重治疗比较最重要;对11/23(48%)已发表的试验进行了调整,24/27(89%)CTU统计人员报告称,他们将考虑调整。围绕多个主要结果和中期分析产生的多重性调整,意见和做法各不相同。由于多个次要结局(17/136[13%]已发表试验和3/27[11%]CTU统计学家考虑调整)和亚组分析(8/85[9%]已发表试验进行调整和6/27CTU[22%]统计学家考虑调整),调整对于多重性被认为不那么重要。
    在报告RCT的统计学家和在CTU工作的应用统计学家之间,关于多重性调整的观点存在差异。需要进一步指导应考虑对主要试验假设进行调整的情况,以及在二次分析的背景下是否有任何建议进行调整的情况。
    Opinions and practices vary around the issue of performing multiple statistical tests in randomised controlled trials (RCTs). We carried out a study to collate information about opinions and practices using a methodological rapid review and a survey, specifically of publicly funded pragmatic RCTs that are not seeking marketing authorisation. The aim was to identify the circumstances under which researchers would make a statistical adjustment for multiplicity.
    A review was performed extracting information from articles reporting primary analyses of pragmatic RCTs in one of seven high quality medical journals, in January to June (inclusive) 2018. A survey (Survey Monkey) eliciting opinions and practices around multiplicity was distributed to the 47 registered clinical trials units (CTUs) in the UK.
    One hundred and thirty-eight RCTs were included in the review, and survey responses were received from 27/47 (57%) CTUs. Both the review and survey indicated that adjusting for multiplicity was considered most important for multiple treatment comparisons; adjustment was performed for 11/23 (48%) published trials, and 24/27 (89%) CTU statisticians reported they would consider adjustment. Opinions and practices varied around adjustment for multiplicity arising from multiple primary outcomes and interim analyses. Adjustment was considered less important for multiplicity due to multiple secondary outcomes (adjustment performed for 17/136 [13%] published trials and 3/27 [11%] CTU statisticians would consider adjustment) and subgroup analyses (8/85 [9%] published trials adjusted and 6/27 CTU [22%] statisticians would consider adjustment).
    There is variation in opinions about adjustment for multiplicity among both statisticians reporting RCTs and applied statisticians working in CTUs. Further guidance is needed on the circumstances in which adjustment should be considered in relation to primary trial hypotheses, and if there are any situations in which adjustment would be recommended in the context of secondary analyses.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    目的:多臂非劣效性(MANI)试验,这里定义为具有多个实验性治疗组的非劣效性试验,在疾病区域存在几种可行的治疗或用于测试不同剂量时间表的情况下,可以是有用的。为了保持此类试验的统计完整性,必须考虑设计和分析方面的问题,从多臂和非自卑的角度来看。目前关于这些方面应如何解决的指导很少,本文的目的是提供建议,以帮助设计未来的MANI试验。
    方法:进行了涵盖四个数据库的综合文献综述,以确定与MANI试验相关的出版物。文献被分为方法学和试验出版物,以调查MANI试验所需的设计和分析考虑因素,以及它们是否在实践中得到解决。
    结果:发现了一些问题,如果没有得到适当解决,可能会导致FWER的问题,权力或偏见。这些范围从设计阶段的试验假设结构到分析阶段的潜在异质治疗差异的考虑。感兴趣的一个关键问题是在分析阶段对多重测试的调整。在介绍和解释MANI试验的结果时,关于更强大的p值调整方法是否优于近似调整后的CI,几乎没有共识。我们发现了65个以前的MANI试验的例子,在被裁定需要的39项测试中,有31项进行了多次测试。试验通常倾向于使用简单的,众所周知的研究设计和分析方法,虽然显示了一些关于FWER通胀和功率选择的意识,许多试验似乎没有考虑这些问题,也没有对所选的设计和分析方法提供足够的定义.
    结论:虽然迄今为止的MANI试验表明对本文中提出的问题有一些认识,很少有人满足概述的建议的标准。展望未来,试验应考虑本文中的建议,并确保它们明确定义和理由选择试验设计和分析技术。
    OBJECTIVE: Multi-arm non-inferiority (MANI) trials, here defined as non-inferiority trials with multiple experimental treatment arms, can be useful in situations where several viable treatments exist for a disease area or for testing different dose schedules. To maintain the statistical integrity of such trials, issues regarding both design and analysis must be considered, from both the multi-arm and the non-inferiority perspectives. Little guidance currently exists on exactly how these aspects should be addressed and it is the aim of this paper to provide recommendations to aid the design of future MANI trials.
    METHODS: A comprehensive literature review covering four databases was conducted to identify publications associated with MANI trials. Literature was split into methodological and trial publications in order to investigate the required design and analysis considerations for MANI trials and whether they were being addressed in practice.
    RESULTS: A number of issues were identified that if not properly addressed, could lead to issues with the FWER, power or bias. These ranged from the structuring of trial hypotheses at the design stage to the consideration of potential heterogeneous treatment variances at the analysis stage. One key issue of interest was adjustment for multiple testing at the analysis stage. There was little consensus concerning whether more powerful p value adjustment methods were preferred to approximate adjusted CIs when presenting and interpreting the results of MANI trials. We found 65 examples of previous MANI trials, of which 31 adjusted for multiple testing out of the 39 that were adjudged to require it. Trials generally preferred to utilise simple, well-known methods for study design and analysis and while some awareness was shown concerning FWER inflation and choice of power, many trials seemed not to consider the issues and did not provide sufficient definition of their chosen design and analysis approaches.
    CONCLUSIONS: While MANI trials to date have shown some awareness of the issues raised within this paper, very few have satisfied the criteria of the outlined recommendations. Going forward, trials should consider the recommendations in this paper and ensure they clearly define and reason their choices of trial design and analysis techniques.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    While current guidelines generally recommend single endpoints for primary analyses of confirmatory clinical trials, it is recognized that certain settings require inference on multiple endpoints for comprehensive conclusions on treatment effects. Furthermore, combining treatment effect estimates from several outcome measures can increase the statistical power of tests. Such an efficient use of resources is of special relevance for trials in small populations. This paper reviews approaches based on a combination of test statistics or measurements across endpoints as well as multiple testing procedures that allow for confirmatory conclusions on individual endpoints. We especially focus on feasibility in trials with small sample sizes and do not solely rely on asymptotic considerations. A systematic literature search in the Scopus database, supplemented by a manual search, was performed to identify research papers on analysis methods for multiple endpoints with relevance to small populations. The identified methods were grouped into approaches that combine endpoints into a single measure to increase the power of statistical tests and methods to investigate differential treatment effects in several individual endpoints by multiple testing.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    可复制的结果定义了现代研究中科学完整性的核心。然而,人们对研究结果的可重复性提出了合理的担忧,对科学进步和公众支持具有重要意义。随着统计实践日益成为整个科学研究工作的重要组成部分,这篇评论文章强调了统计学在确保动物科学研究结果具有可重复性方面的令人信服的作用,换句话说,能够承受严密的审讯和独立的验证。统计设置了一个正式的框架和实用的工具箱,如果实施得当,可以从嘈杂的数据中恢复信号。然而,误解和滥用统计数据被认为是造成可重复性危机的首要因素。在这篇文章中,我们重新审视与动物科学背景下的可重复研究相关的基础统计概念,提高对破坏统计的常见滥用的认识,并概述统计实践的建议。具体来说,我们强调在整个研究工作中对数据生成过程的敏锐理解,从周到的实验设计和随机化,通过严格的数据分析和推断,在将研究成果传达给同行科学家和整个社会时,措辞谨慎。我们详细讨论了实验设计中的核心概念,包括数据架构,实验复制,和二次采样,并阐述对适当引出研究结果范围的实际意义。对于数据分析,我们强调正确实施混合模式,在分布假设和固定和随机效应的规范方面,以明确识别多级数据体系结构。这对于确保正确识别感兴趣的治疗的实验误差并且正确校准推断是至关重要的。与使用P值相关的推理误解,既重要又不重要,澄清了,并说明了由于多重比较和选择性报告而导致的与错误通货膨胀相关的问题。总的来说,我们提倡在动物科学中进行负责任的统计实践,强调继续定量教育和动物科学家和统计学家之间的跨学科合作,以最大限度地提高研究成果的可重复性。
    Reproducible results define the very core of scientific integrity in modern research. Yet, legitimate concerns have been raised about the reproducibility of research findings, with important implications for the advancement of science and for public support. With statistical practice increasingly becoming an essential component of research efforts across the sciences, this review article highlights the compelling role of statistics in ensuring that research findings in the animal sciences are reproducible-in other words, able to withstand close interrogation and independent validation. Statistics set a formal framework and a practical toolbox that, when properly implemented, can recover signal from noisy data. Yet, misconceptions and misuse of statistics are recognized as top contributing factors to the reproducibility crisis. In this article, we revisit foundational statistical concepts relevant to reproducible research in the context of the animal sciences, raise awareness on common statistical misuse undermining it, and outline recommendations for statistical practice. Specifically, we emphasize a keen understanding of the data generation process throughout the research endeavor, from thoughtful experimental design and randomization, through rigorous data analysis and inference, to careful wording in communicating research results to peer scientists and society in general. We provide a detailed discussion of core concepts in experimental design, including data architecture, experimental replication, and subsampling, and elaborate on practical implications for proper elicitation of the scope of reach of research findings. For data analysis, we emphasize proper implementation of mixed models, in terms of both distributional assumptions and specification of fixed and random effects to explicitly recognize multilevel data architecture. This is critical to ensure that experimental error for treatments of interest is properly recognized and inference is correctly calibrated. Inferential misinterpretations associated with use of P-values, both significant and not, are clarified, and problems associated with error inflation due to multiple comparisons and selective reporting are illustrated. Overall, we advocate for a responsible practice of statistics in the animal sciences, with an emphasis on continuing quantitative education and interdisciplinary collaboration between animal scientists and statisticians to maximize reproducibility of research findings.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    目的:多重假设检验(或多重检验)是指在一次分析中检验多个假设,并且可以夸大研究中的I型错误率(假阳性)。这篇综述的目的是量化耳鼻喉科文献中最近的大型临床研究中的多项测试,并讨论解决这一潜在问题的策略。
    方法:2012年在四个普通耳鼻咽喉科期刊上发表了超过100名受试者的原始临床研究文章,期刊引文报告了5年的影响因素。
    方法:对文章进行了综述,以确定作者是否在至少一个推论家族中测试了五个以上的假设。对于符合此标准的多次测试的文章,计算了I型错误率,并对报告的结果进行统计校正.
    结果:在回顾的195篇原始临床研究文章中,72%满足多重测试的标准。在这些研究中,平均有41%的机会出现I型错误,平均而言,18%的显著结果可能是假阳性。在应用Bonferroni校正后,文章中报道的显著结果中,只有57%仍然显著.
    结论:在耳鼻喉科的近期大型临床研究中,多重检测是常见的,值得研究人员密切关注。审稿人,和编辑。讨论了多次测试的调整策略。
    OBJECTIVE: Multiple hypothesis testing (or multiple testing) refers to testing more than one hypothesis within a single analysis, and can inflate the type I error rate (false positives) within a study. The aim of this review was to quantify multiple testing in recent large clinical studies in the otolaryngology literature and to discuss strategies to address this potential problem.
    METHODS: Original clinical research articles with >100 subjects published in 2012 in the four general otolaryngology journals with the highest Journal Citation Reports 5-year impact factors.
    METHODS: Articles were reviewed to determine whether the authors tested greater than five hypotheses in at least one family of inferences. For the articles meeting this criterion for multiple testing, type I error rates were calculated, and statistical correction was applied to the reported results.
    RESULTS: Of the 195 original clinical research articles reviewed, 72% met the criterion for multiple testing. Within these studies, there was a mean 41% chance of a type I error and, on average, 18% of significant results were likely to be false positives. After the Bonferroni correction was applied, only 57% of significant results reported within the articles remained significant.
    CONCLUSIONS: Multiple testing is common in recent large clinical studies in otolaryngology and deserves closer attention from researchers, reviewers, and editors. Strategies for adjusting for multiple testing are discussed.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

公众号