significance testing

显著性检验
  • 文章类型: Journal Article
    最近,Asparouhov和Muthén结构方程建模:多学科期刊,28,1-14,(2021a,2021b)提出了Wald检验的一种变体,该变体使用马尔可夫链蒙特卡罗机制生成用于频率推断的卡方检验统计量。因为测试的组成不依赖于采样变化和协变的解析表达式,它可能提供了一种方法来在基于似然检验统计量的假设分解的情况下获得诚实显著性检验(例如,在小样本中)。这项研究的目的是使用模拟来比较新的MCMWald测试与它的最大似然对应物,关于它们的类型I错误率和功率。我们的模拟检查了不同样本大小级别的测试统计数据,效果大小,和自由度(测试复杂度)。另一个目标是评估MCMCWald检验与非正常数据的稳健性。仿真结果一致表明,MCMCWald检验优于最大似然检验统计量,尤其是小样本(例如,样本量小于150)和复杂模型(例如,具有五个或更多预测因子的模型)。这一结论也适用于非正态数据。最后,我们提供了一个真实数据示例的简要应用程序。
    Recently, Asparouhov and Muthén Structural Equation Modeling: A Multidisciplinary Journal, 28, 1-14, (2021a, 2021b) proposed a variant of the Wald test that uses Markov chain Monte Carlo machinery to generate a chi-square test statistic for frequentist inference. Because the test\'s composition does not rely on analytic expressions for sampling variation and covariation, it potentially provides a way to get honest significance tests in cases where the likelihood-based test statistic\'s assumptions break down (e.g., in small samples). The goal of this study is to use simulation to compare the new MCM Wald test to its maximum likelihood counterparts, with respect to both their type I error rate and power. Our simulation examined the test statistics across different levels of sample size, effect size, and degrees of freedom (test complexity). An additional goal was to assess the robustness of the MCMC Wald test with nonnormal data. The simulation results uniformly demonstrated that the MCMC Wald test was superior to the maximum likelihood test statistic, especially with small samples (e.g., sample sizes less than 150) and complex models (e.g., models with five or more predictors). This conclusion held for nonnormal data as well. Lastly, we provide a brief application to a real data example.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    系统神经科学的一个重要目标是了解神经元相互作用的结构,经常通过研究记录的神经元信号之间的功能关系来接近。常用的成对措施(例如,相关系数)提供有限的洞察力,既不能解决估计的神经元相互作用的特异性,也不能解决神经元信号之间潜在的协同耦合。三方措施,例如部分相关,方差划分,和部分信息分解,通过将功能关系解开到可解释的信息原子(唯一的,冗余,和协同作用)。这里,我们将这些三方措施应用于模拟神经元记录,以调查它们对噪声的敏感性。我们发现,所考虑的措施对于无噪声源的信号大多是准确且特定的,但对于有噪声源却存在很大的偏差。我们表明,即使对于较小的噪声部分和较大的数据大小,对此类措施进行置换测试也会导致较高的假阳性率。我们提出了一个保守的零假设,用于三方测度的显著性检验,这显著降低了假阳性率,但以增加假阴性率为可承受的代价。我们希望我们的研究提高对显著性测试和功能关系解释的潜在陷阱的认识,提供概念和实用的建议。
    三方功能关系测量能够研究神经记录中的有趣效应,比如冗余,功能连接特异性,和协同耦合。然而,这种关系的估计器通常使用无噪声信号进行验证,而神经记录通常包含噪声。在这里,我们系统地研究了使用模拟噪声神经信号的三方估计器的性能。我们证明了置换测试不是从常用的三方关系估计器推断地面实况统计关系的可靠程序。我们开发了一个调整后的保守测试程序,当应用于嘈杂数据时,降低了所研究估计量的假阳性率。除了解决显著性测试,我们的结果应该有助于准确解释三方功能关系和功能连通性。
    An important goal in systems neuroscience is to understand the structure of neuronal interactions, frequently approached by studying functional relations between recorded neuronal signals. Commonly used pairwise measures (e.g., correlation coefficient) offer limited insight, neither addressing the specificity of estimated neuronal interactions nor potential synergistic coupling between neuronal signals. Tripartite measures, such as partial correlation, variance partitioning, and partial information decomposition, address these questions by disentangling functional relations into interpretable information atoms (unique, redundant, and synergistic). Here, we apply these tripartite measures to simulated neuronal recordings to investigate their sensitivity to noise. We find that the considered measures are mostly accurate and specific for signals with noiseless sources but experience significant bias for noisy sources.We show that permutation testing of such measures results in high false positive rates even for small noise fractions and large data sizes. We present a conservative null hypothesis for significance testing of tripartite measures, which significantly decreases false positive rate at a tolerable expense of increasing false negative rate. We hope our study raises awareness about the potential pitfalls of significance testing and of interpretation of functional relations, offering both conceptual and practical advice.
    Tripartite functional relation measures enable the study of interesting effects in neural recordings, such as redundancy, functional connection specificity, and synergistic coupling. However, estimators of such relations are commonly validated using noiseless signals, whereas neural recordings typically contain noise. Here we systematically study the performance of tripartite estimators using simulated noisy neural signals. We demonstrate that permutation testing is not a robust procedure for inferring ground truth statistical relations from commonly used tripartite relation estimators. We develop an adjusted conservative testing procedure, reducing false positive rates of the studied estimators when applied to noisy data. Besides addressing significance testing, our results should aid in accurate interpretation of tripartite functional relations and functional connectivity.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    暂无摘要。
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Review
    背景:美国统计协会强调了零假设显著性检验的问题,并概述了可能“补充甚至替代P值”的替代方法。一种选择是报告假阳性风险(FPR),当结果具有统计学意义时,它量化了零假设为真的机会。
    方法:我们回顾了单中心,在10个麻醉期刊中进行的随机试验超过6年,主要二元结局的差异具有统计学意义。我们通过两种方法计算了贝叶斯因子(Gunel,卡斯)。根据贝叶斯因子,我们计算了不同先前信念的FPR,以获得真正的治疗效果。通过将预测概率分配给零假设和替代假设来量化先前的信念。
    结果:对于0.5的相等预测试概率,Gunel方法的中位数(四分位数范围[IQR])FPR为6%(1-22%),Kass方法为6%(1-19%)。五分之一的试验FPR≥20%。对于报告P值0.01-0.05的试验,Gunel方法的中值(IQR)FPR为25%(16-30%),Kass方法为20%(16-25%)。超过90%的报告P值0.01-0.05的试验需要预测概率>0.5才能达到5%的FPR。通过两种方法计算的FPR的中值(IQR)差异为0%(0-2%)。
    结论:我们的研究结果表明,相当大比例的单中心麻醉试验报告统计学上的显著差异提供了有限的实际治疗效果的证据,或者,或者,需要对真正的治疗效果抱有难以置信的高信念。
    背景:PROSPERO(CRD42023350783)。
    BACKGROUND: The American Statistical Association has highlighted problems with null hypothesis significance testing and outlined alternative approaches that may \'supplement or even replace P-values\'. One alternative is to report the false positive risk (FPR), which quantifies the chance the null hypothesis is true when the result is statistically significant.
    METHODS: We reviewed single-centre, randomised trials in 10 anaesthesia journals over 6 yr where differences in a primary binary outcome were statistically significant. We calculated a Bayes factor by two methods (Gunel, Kass). From the Bayes factor we calculated the FPR for different prior beliefs for a real treatment effect. Prior beliefs were quantified by assigning pretest probabilities to the null and alternative hypotheses.
    RESULTS: For equal pretest probabilities of 0.5, the median (inter-quartile range [IQR]) FPR was 6% (1-22%) by the Gunel method and 6% (1-19%) by the Kass method. One in five trials had an FPR ≥20%. For trials reporting P-values 0.01-0.05, the median (IQR) FPR was 25% (16-30%) by the Gunel method and 20% (16-25%) by the Kass method. More than 90% of trials reporting P-values 0.01-0.05 required a pretest probability >0.5 to achieve an FPR of 5%. The median (IQR) difference in the FPR calculated by the two methods was 0% (0-2%).
    CONCLUSIONS: Our findings suggest that a substantial proportion of single-centre trials in anaesthesia reporting statistically significant differences provide limited evidence of real treatment effects, or, alternatively, required an implausibly high prior belief in a real treatment effect.
    BACKGROUND: PROSPERO (CRD42023350783).
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    最近,人们非常感兴趣的是对高维线性模型中的参数进行推断。在本文中,我们认为这个任务是一个简单而非常幼稚的两步程序,其中我们(I)拟合套索模型以获得变量的子集,和(ii)在套索选择集合上拟合最小二乘模型。传统的统计智慧告诉我们,我们不能利用标准的统计推断工具对由此产生的最小二乘模型(如置信区间和p值),因为我们偷看了两次数据:一次是在运行套索时,并再次拟合最小二乘模型。然而,在本文中,我们证明,在一组假设下,很有可能,套索选择的变量集与无噪声套索选择的变量集相同,因此是确定性的。因此,幼稚的两步法可以得出渐近有效的推论。我们利用这一发现来开发幼稚的置信区间,可以用来推断套索选择的模型的回归系数,以及天真的分数测试,它可以用来检验关于全模型回归系数的假设。
    A great deal of interest has recently focused on conducting inference on the parameters in a high-dimensional linear model. In this paper, we consider a simple and very naïve two-step procedure for this task, in which we (i) fit a lasso model in order to obtain a subset of the variables, and (ii) fit a least squares model on the lasso-selected set. Conventional statistical wisdom tells us that we cannot make use of the standard statistical inference tools for the resulting least squares model (such as confidence intervals and p-values), since we peeked at the data twice: once in running the lasso, and again in fitting the least squares model. However, in this paper, we show that under a certain set of assumptions, with high probability, the set of variables selected by the lasso is identical to the one selected by the noiseless lasso and is hence deterministic. Consequently, the naïve two-step approach can yield asymptotically valid inference. We utilize this finding to develop the naïve confidence interval, which can be used to draw inference on the regression coefficients of the model selected by the lasso, as well as the naïve score test, which can be used to test the hypotheses regarding the full-model regression coefficients.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    诸如Kendall的Tau或Spearman的Rho之类的统计依赖性度量经常用于分析环境数据分析中时间序列之间的一致性。数据的自相关可以,然而,如果不考虑,会导致虚假的互相关。这里,我们给出了斯皮尔曼的Rho和肯德尔的Tau估计的渐近分布,可用于自相关观测值之间的互相关的统计假设检验。结果是在绝对规则(或β混合)过程的假设下使用U统计量得出的。这些包括许多短程相关过程,例如ARMA-,GARCH-和一些与环境科学相关的基于copula的模型。我们证明,虽然绝对规律性的假设是必需的,假设检验不必指定特定类型的模型。仿真表明,在自相关条件下,修正假设检验对一些常见的随机模型和小到中等样本量的改进性能。该方法适用于欧洲观测到的洪水流量和温度的气候时间序列。虽然标准测试结果是洪水和温度之间的虚假相关性,这不是提议的测试的情况,这与欧洲洪水制度变迁的文献更为一致。
    Statistical dependency measures such as Kendall\'s Tau or Spearman\'s Rho are frequently used to analyse the coherence between time series in environmental data analyses. Autocorrelation of the data can, however, result in spurious cross correlations if not accounted for. Here, we present the asymptotic distribution of the estimators of Spearman\'s Rho and Kendall\'s Tau, which can be used for statistical hypothesis testing of cross-correlations between autocorrelated observations. The results are derived using U-statistics under the assumption of absolutely regular (or β-mixing) processes. These comprise many short-range dependent processes, such as ARMA-, GARCH- and some copula-based models relevant in the environmental sciences. We show that while the assumption of absolute regularity is required, the specific type of model does not have to be specified for the hypothesis test. Simulations show the improved performance of the modified hypothesis test for some common stochastic models and small to moderate sample sizes under autocorrelation. The methodology is applied to observed climatological time series of flood discharges and temperatures in Europe. While the standard test results in spurious correlations between floods and temperatures, this is not the case for the proposed test, which is more consistent with the literature on flood regime changes in Europe.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    在医学研究中,零假设显著性检验(NHST)是统计推断的主要框架。NHST涉及计算P值和置信区间,以量化没有效果的零假设的证据。然而,P值和置信区间不能告诉我们假设成立的概率。相比之下,假阳性风险(FPR)和假阴性风险(FNR)是关于假设真实性的检验后概率,也就是说,实际效果存在的可能性。
    我们根据假设成立的预测概率为0.5,计算了重症监护中53个个体多中心试验的FPR或FNR。
    对于报告统计学意义的试验,FPR在0.1%至57.6%之间变化。对于报告非显著性的试验,FNR在1.7%至36.9%之间变化。报告非显著性的47项试验中有26项(55.3%)提供了支持零假设的有力或非常有力的证据;其余试验提供的证据有限。P值与FNR之间没有明显关系。
    FPR和FNR显示出明显的变异性,这表明试验之间真实或不存在治疗效果的可能性存在很大差异。只有一项报告统计意义的试验提供了令人信服的证据,证明了真正的治疗效果。在所有报告无显著性的试验中,近一半的试验提供了有限的证据证明没有治疗效果.我们的研究结果表明,重症监护多中心试验的证据质量差异很大。
    UNASSIGNED: In medical research, null hypothesis significance testing (NHST) is the dominant framework for statistical inference. NHST involves calculating P-values and confidence intervals to quantify the evidence against the null hypothesis of no effect. However, P-values and confidence intervals cannot tell us the probability that the hypothesis is true. In contrast, false-positive risk (FPR) and false-negative risk (FNR) are post-test probabilities concerning the truth of the hypothesis, that is to say, the probability a real effect exists.
    UNASSIGNED: We calculated the FPR or FNR for 53 individual multicentre trials in critical care based on a pretest probability of 0.5 that the hypothesis was true.
    UNASSIGNED: For trials reporting statistical significance, the FPR varied between 0.1% and 57.6%. For trials reporting non-significance, the FNR varied between 1.7% and 36.9%. Twenty-six of 47 trials (55.3%) reporting non-significance provided strong or very strong evidence in favour of the null hypothesis; the remaining trials provided limited evidence. There was no obvious relationship between the P-value and the FNR.
    UNASSIGNED: The FPR and FNR showed marked variability, indicating that the probability of a real or absent treatment effect differed substantially between trials. Only one trial reporting statistical significance provided convincing evidence of a real treatment effect, and nearly half of all trials reporting non-significance provided limited evidence for the absence of a treatment effect. Our findings suggest that the quality of evidence from multicentre trials in critical care is highly variable.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    目前几乎没有证据表明人类表型的遗传基础在整个生命周期中显著变化。然而,时间到事件的表型研究不足,可以认为是反映潜在的危险,当价值范围很广时,这不太可能在一生中保持不变。这里,我们发现,在英国生物库的245个全基因组与自然绝经(ANM)年龄的显著遗传关联中,74%显示出一种特定年龄效应.这些复制的发现中有19个仅由我们的建模框架确定,它确定了DNA变体发病年龄关联的时间依赖性,而没有显着的多重测试负担。在更年期早期到晚期的范围内,我们发现了明显不同的潜在生物学途径的证据,ANM与健康指标和结果的遗传相关性迹象的变化,以及推断因果关系的差异。我们发现,DNA损伤反应过程仅对早期ANM女性的卵巢储备和耗竭起作用。遗传介导的ANM延迟与所有年龄段乳腺癌和平滑肌瘤的相对风险增加以及晚期ANM女性的高胆固醇和心力衰竭相关。这些发现表明,通过对大规模生物库数据进行适当的统计建模,可以更好地了解健康指标和结果之间遗传风险因素关系的年龄依赖性。
    There is currently little evidence that the genetic basis of human phenotype varies significantly across the lifespan. However, time-to-event phenotypes are understudied and can be thought of as reflecting an underlying hazard, which is unlikely to be constant through life when values take a broad range. Here, we find that 74% of 245 genome-wide significant genetic associations with age at natural menopause (ANM) in the UK Biobank show a form of age-specific effect. Nineteen of these replicated discoveries are identified only by our modeling framework, which determines the time dependency of DNA-variant age-at-onset associations without a significant multiple-testing burden. Across the range of early to late menopause, we find evidence for significantly different underlying biological pathways, changes in the signs of genetic correlations of ANM to health indicators and outcomes, and differences in inferred causal relationships. We find that DNA damage response processes only act to shape ovarian reserve and depletion for women of early ANM. Genetically mediated delays in ANM were associated with increased relative risk of breast cancer and leiomyoma at all ages and with high cholesterol and heart failure for late-ANM women. These findings suggest that a better understanding of the age dependency of genetic risk factor relationships among health indicators and outcomes is achievable through appropriate statistical modeling of large-scale biobank data.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    临床试验是医学科学研究的支柱。然而,这种实验策略有一些缺点。我们专注于两个问题:(a)临床试验程序确保的内部有效性不一定允许将疗效结果概括为有关人群有效性的因果声明。(b)统计学意义并不意味着临床或实际意义;p值应补充效应大小(ES)估计器和对发现的效应大小的解释。我们对Scopus进行了系统评价(从2000年到2020年),PubMed,和四个ProQuest数据库,包括PsycINFO。我们寻找药物治疗对抑郁症状具有显著影响的实验研究,用特定的抑郁量表测量。我们评估了有效性,在一个小的范围内报告和解释效果大小,临床试验的无偏样本(n=10)。只有30%的研究承认疗效不一定转化为有效性。只有20%的人报告了ES指数,只有40%的人解释了他们发现的重要性。我们鼓励反思从临床试验中得出的关于抗抑郁治疗疗效的结果的适用性,这通常会影响日常临床决策。将抗抑郁药的实验结果与补充观察性研究进行比较,可以为临床医生提供更大的灵活性,以根据患者特征开具药物。此外,应考虑治疗的ES,因为在某些情况下效果小的治疗可能是值得的,尽管有额外的费用或并发症,但效果良好的治疗可能是合理的。因此,我们鼓励研究人员报告和解释ES,并明确讨论其样本对将应用抗抑郁治疗的临床人群的适用性.
    Clinical trials are the backbone of medical scientific research. However, this experimental strategy has some drawbacks. We focused on two issues: (a) The internal validity ensured by clinical trial procedures does not necessarily allow for generalization of efficacy results to causal claims about effectiveness in the population. (b) Statistical significance does not imply clinical or practical significance; p-values should be supplemented with effect size (ES) estimators and an interpretation of the magnitude of the effects found. We conducted a systematic review (from 2000 to 2020) on Scopus, PubMed, and four ProQuest databases, including PsycINFO. We searched for experimental studies with significant effects of pharmacological treatments on depressive symptoms, measured with a specific scale for depression. We assessed the claims of effectiveness, and reporting and interpreting of effect sizes in a small, unbiased sample of clinical trials (n = 10). Only 30% of the studies acknowledged that efficacy does not necessarily translate to effectiveness. Only 20% reported ES indices, and only 40% interpreted the magnitude of their findings. We encourage reflection on the applicability of results derived from clinical trials about the efficacy of antidepressant treatments, which often influence daily clinical decision-making. Comparing experimental results of antidepressants with supplementary observational studies can provide clinicians with greater flexibility in prescribing medication based on patient characteristics. Furthermore, the ES of a treatment should be considered, as treatments with a small effect may be worthwhile in certain circumstances, while treatments with a large effect may be justified despite additional costs or complications. Therefore, researchers are encouraged to report and interpret ES and explicitly discuss the suitability of their sample for the clinical population to which the antidepressant treatment will be applied.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    中介分析研究了暴露可能通过称为介体的中介变量直接和间接影响结果的情况。经常感兴趣的是测试暴露对结果的影响,标准方法是简单地将后者回归前者。然而,通过纳入调解员,似乎可以获得更强大的测试统计数据。这将是有用的情况下,曝光效果的大小可能是小的,例如,这在基因组学应用中是常见的。以前的工作表明,在完全调解下,这确实是可能的,没有直接影响的地方。在大多数应用中,然而,直接影响可能非零。本文研究了线性中介模型,发现在一定条件下,在这种不完全的中介设置下,功率增益仍然是可能的,用于测试零假设,即既没有直接影响也没有间接影响。我们研究了一类可以实现这种性能的程序,并将其应用于低维和高维介体。然后,我们在模拟以及使用DNA甲基化介体研究吸烟对基因表达的影响的分析中说明了它们的表现。
    Mediation analysis studies situations where an exposure may affect an outcome both directly and indirectly through intervening variables called mediators. It is frequently of interest to test for the effect of the exposure on the outcome, and the standard approach is simply to regress the latter on the former. However, it seems plausible that a more powerful test statistic could be achieved by also incorporating the mediators. This would be useful in cases where the exposure effect size might be small, which for example is common in genomics applications. Previous work has shown that this is indeed possible under complete mediation, where there is no direct effect. In most applications, however, the direct effect is likely nonzero. In this paper we study linear mediation models and find that under certain conditions, power gain is still possible under this incomplete mediation setting for testing the null hypothesis that there is neither a direct nor an indirect effect. We study a class of procedures that can achieve this performance and develop their application to both low- and high-dimensional mediators. We then illustrate their performances in simulations as well as in an analysis using DNA methylation mediators to study the effect of cigarette smoking on gene expression.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

公众号