统计意义 statistical significance-医云文献数字医云科研云海量医学决策数据服务

statistical significance 关注

统计意义

文献(214篇)

百科

视频

1 About statistical significance, and the lack thereof.

关于统计意义，和缺乏。影响指数 : 2.908
发表时间：Aug 2024 19
来源期刊：Lab Anim PMID：39157984

DOI：10.1177/00236772241248509
文章类型： Journal Article

缺乏统计显著性(即，p>0.05）在比较两个样本的频率检验结果中，通常用作没有差异的证据，或者没有治疗效果，在测量变量上。这样的结论通常是错误的，因为缺乏意义可能仅仅是由于样本量太小而无法揭示效果。得出治疗/病症没有有意义的效果的结论，有必要使用适当的统计方法。对于频率统计，实现这一目标的一个简单工具是两个单边t检验，\'一种等效测试形式，依赖于被认为相关的最小差异的先验定义。换句话说,利益的最小影响大小应事先确定。我们介绍了此测试的原理，并给出了示例，其中可以正确解释经典t检验的结果，假设没有差异。等效测试在探测某些重要结果是否也具有生物学意义方面也非常有用，因为当比较大样本时，可以在等效性检验和双样本t检验中找到显著结果，假设没有差异作为零假设。
Absence of statistical significance (i.e., p > 0.05) in the results of a frequentist test comparing two samples is often used as evidence of absence of difference, or absence of effect of a treatment, on the measured variable. Such conclusions are often wrong because absence of significance may merely result from a sample size that is too small to reveal an effect. To conclude that there is no meaningful effect of a treatment/condition, it is necessary to use an appropriate statistical approach. For frequentist statistics, a simple tool for this goal is the \'two one-sided t-test,\' a form of equivalence test that relies on the a priori definition of a minimal difference considered to be relevant. In other words, the smallest effect size of interest should be established in advance. We present the principles of this test and give examples where it allows correct interpretation of the results of a classical t-test assuming absence of difference. Equivalence tests are also very useful in probing whether certain significant results are also biologically meaningful, because when comparing large samples it is possible to find significant results in both an equivalence test and in a two-sample t-test, assuming no difference as the null hypothesis.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

求助全文
2 For a proper use of frequentist inferential statistics in public health.

为了在公共卫生中正确使用频繁推理统计。影响指数 : 暂无
发表时间：Dec 2024
来源期刊：Glob Epidemiol PMID：39021384

DOI：10.1016/j.gloepi.2024.100151
文章类型： Journal Article

正如文献和美国统计协会等国际机构广泛指出的那样，对P值的严重误解，置信区间，和统计意义在公共卫生中很常见。这种情况会带来有关最终决定的严重风险，例如批准或拒绝治疗。对统计数据的认知扭曲可能源于学校和大学的糟糕教学，过于简化的解释，正如我们所建议的那样，不计后果地使用具有预定义标准化程序的计算软件。鉴于此,我们提出了一个框架来重新校准频繁推理统计在临床和流行病学研究中的作用。特别是,我们强调，统计数据只是一组规则和数字，只有在事先适当地置于明确定义的科学背景下才有意义。出于教育目的讨论了实际例子。除此之外,我们提出了一些工具来更好地评估统计结果，例如多个兼容性或令人惊讶的间隔或各种点假设的元组。最后,我们强调，每个结论都必须由不同类型的科学证据(例如，生物化学，临床,统计，等。)，并且必须基于对成本的仔细检查，风险,和好处。
As widely noted in the literature and by international bodies such as the American Statistical Association, severe misinterpretations of P-values, confidence intervals, and statistical significance are sadly common in public health. This scenario poses serious risks concerning terminal decisions such as the approval or rejection of therapies. Cognitive distortions about statistics likely stem from poor teaching in schools and universities, overly simplified interpretations, and - as we suggest - the reckless use of calculation software with predefined standardized procedures. In light of this, we present a framework to recalibrate the role of frequentist-inferential statistics within clinical and epidemiological research. In particular, we stress that statistics is only a set of rules and numbers that make sense only when properly placed within a well-defined scientific context beforehand. Practical examples are discussed for educational purposes. Alongside this, we propose some tools to better evaluate statistical outcomes, such as multiple compatibility or surprisal intervals or tuples of various point hypotheses. Lastly, we emphasize that every conclusion must be informed by different kinds of scientific evidence (e.g., biochemical, clinical, statistical, etc.) and must be based on a careful examination of costs, risks, and benefits.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)
3 Statistical significance, clinical importance and effect sizes: Enhancing understanding of a study's results.

统计意义，临床重要性和效应大小：增强对研究结果的理解。影响指数 : 3.558
发表时间：Jul 2024 2
来源期刊：J Oral Rehabil PMID：38956893

DOI：10.1111/joor.13759
文章类型： Journal Article

背景：对研究结果的正确解释既需要对良好的方法实践的深刻理解，又需要对先前结果的深入了解，由效果大小的可用性辅助。
方法：这篇综述采用了一篇说明性文章的形式，探讨了统计意义之间的复杂而细微的关系，临床重要性，和效果大小。
结果：仔细注意研究设计和方法将增加获得统计学意义的可能性，并可能增强研究人员/读者准确解释结果的能力。效应大小的度量表明研究中使用的变量如何很好地解释/解释数据中的变异性。报告强效应的研究可能比报告弱效应的研究具有更大的实用价值/效用。效应大小需要在上下文中解释。效果大小的口头摘要表征(例如，\"弱\",\“strong\”）从根本上是有缺陷的，可能导致对结果的不恰当表征。通用语言效果大小（CLES）指标是一种相对较新的效果大小方法，可以提供更易于理解的结果解释，可以使提供者受益。病人,和广大公众。
结论：以研究界和公众都清楚的方式传达研究结果非常重要。至少,这需要在研究报告中纳入标准效应大小数据。正确选择措施和仔细设计研究是解释研究结果的基础。当研究人员提高其工作的方法学质量时，从研究中得出有用结论的能力就会增强。
BACKGROUND: The proper interpretation of a study\'s results requires both excellent understanding of good methodological practices and deep knowledge of prior results, aided by the availability of effect sizes.
METHODS: This review takes the form of an expository essay exploring the complex and nuanced relationships among statistical significance, clinical importance, and effect sizes.
RESULTS: Careful attention to study design and methodology will increase the likelihood of obtaining statistical significance and may enhance the ability of investigators/readers to accurately interpret results. Measures of effect size show how well the variables used in a study account for/explain the variability in the data. Studies reporting strong effects may have greater practical value/utility than studies reporting weak effects. Effect sizes need to be interpreted in context. Verbal summary characterizations of effect sizes (e.g., \"weak\", \"strong\") are fundamentally flawed and can lead to inappropriate characterization of results. Common language effect size (CLES) indicators are a relatively new approach to effect sizes that may offer a more accessible interpretation of results that can benefit providers, patients, and the public at large.
CONCLUSIONS: It is important to convey research findings in ways that are clear to both the research community and to the public. At a minimum, this requires inclusion of standard effect size data in research reports. Proper selection of measures and careful design of studies are foundational to the interpretation of a study\'s results. The ability to draw useful conclusions from a study is increased when investigators enhance the methodological quality of their work.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

求助全文
4 Investigating the Performance of Frequentist and Bayesian Techniques in Genomic Evaluation.

研究频率和贝叶斯技术在基因组评估中的表现。影响指数 : 2.22
发表时间：Jul 2024 1
来源期刊：Biochem Genet PMID：38951354

DOI：10.1007/s10528-024-10842-1
文章类型： Journal Article

基因组评估过程依赖于基因组水平的密集单核苷酸多态性（SNP）标记与数量性状基因座（QTL）之间的连锁不平衡假设。本研究的目的是评估四种频率方法，包括岭回归，最小绝对收缩和选择算子(LASSO)，ElasticNet,基因组最佳线性无偏预测(GBLUP)和包括贝叶斯岭回归(BRR)在内的五种贝叶斯方法，贝叶斯A,贝叶斯LASSO,贝叶斯C,和贝叶斯B，在使用模拟数据的基因组选择中。基于统计显著性(p值)成对评估预测准确性之间的差异(即，t检验和Mann-WhitneyU检验)和实际意义(科恩的d效应大小)为此，数据是基于两种不同标记密度（整个基因组中的4000和8000）的情景进行模拟的。模拟数据包括一个有四个染色体的基因组，每个1摩根，其中100个随机分布的QTL和两个不同密度的均匀分布的SNP(1000和2000)，在0.4的遗传力水平，被认为。对于除GBLUP外的频率论方法，正则化参数λ是使用五折交叉验证方法计算的。对于这两种情况，在频率论方法中，通过岭回归和GBLUP观察到最高的预测准确性。岭回归和GBLUP显示了最低和最高的偏差，分别。此外，在贝叶斯方法中，BayesB和BRR显示出最高和最低的预测精度，分别。贝叶斯LASSO记录了两种情况下的最低偏差，第一种和第二种情况下的最高偏差由BRR和贝叶斯B显示，分别。在这两种情况下的所有研究方法中，BayesB、LASSO和ElasticNet显示了最高和最低的精度，分别。不出所料,在GBLUP和BRR之间观察到最大的性能相似性(d=0.007，在第一种情况下，d=0.003，在第二种情况下)。从参数t和非参数Mann-WhitneyU检验获得的结果相似。在第一种和第二种情况下，在每个场景中所研究方法的性能之间进行36t检验，14（P<。001）和2（P＜。05)比较显著，分别,这表明随着预测因子数量的增加，不同方法的性能差异减小。这是根据科恩的d效应大小证明的，因此，随着模型复杂性的增加，效应大小并没有被视为非常大。在将这些方法用于基因组评估之前，应通过交叉验证方法优化频率方法中的正则化参数。
The genomic evaluation process relies on the assumption of linkage disequilibrium between dense single-nucleotide polymorphism (SNP) markers at the genome level and quantitative trait loci (QTL). The present study was conducted with the aim of evaluating four frequentist methods including Ridge Regression, Least Absolute Shrinkage and Selection Operator (LASSO), Elastic Net, and Genomic Best Linear Unbiased Prediction (GBLUP) and five Bayesian methods including Bayes Ridge Regression (BRR), Bayes A, Bayesian LASSO, Bayes C, and Bayes B, in genomic selection using simulation data. The difference between prediction accuracy was assessed in pairs based on statistical significance (p-value) (i.e., t test and Mann-Whitney U test) and practical significance (Cohen\'s d effect size) For this purpose, the data were simulated based on two scenarios in different marker densities (4000 and 8000, in the whole genome). The simulated data included a genome with four chromosomes, 1 Morgan each, on which 100 randomly distributed QTL and two different densities of evenly distributed SNPs (1000 and 2000), at the heritability level of 0.4, was considered. For the frequentist methods except for GBLUP, the regularization parameter λ was calculated using a five-fold cross-validation approach. For both scenarios, among the frequentist methods, the highest prediction accuracy was observed by Ridge Regression and GBLUP. The lowest and the highest bias were shown by Ridge Regression and GBLUP, respectively. Also, among the Bayesian methods, Bayes B and BRR showed the highest and lowest prediction accuracy, respectively. The lowest bias in both scenarios was registered by Bayesian LASSO and the highest bias in the first and the second scenario were shown by BRR and Bayes B, respectively. Across all the studied methods in both scenarios, the highest and the lowest accuracy were shown by Bayes B and LASSO and Elastic Net, respectively. As expected, the greatest similarity in performance was observed between GBLUP and BRR ( d = 0.007 , in the first scenario and d = 0.003 , in the second scenario). The results obtained from parametric t and non-parametric Mann-Whitney U tests were similar. In the first and second scenario, out of 36 t test between the performance of the studied methods in each scenario, 14 ( P < . 001 ) and 2 ( P < . 05 ) comparisons were significant, respectively, which indicates that with the increase in the number of predictors, the difference in the performance of different methods decreases. This was proven based on the Cohen\'s d effect size, so that with the increase in the complexity of the model, the effect size was not seen as very large. The regularization parameters in frequentist methods should be optimized by cross-validation approach before using these methods in genomic evaluation.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

求助全文
5 Rich-club organization of whole-brain spatio-temporal multilayer functional connectivity networks.

全脑时空多层功能连接网络的丰富俱乐部组织。影响指数 : 5.152
发表时间：2024
来源期刊：Front Neurosci PMID：38855440

DOI：10.3389/fnins.2024.1405734
文章类型： Journal Article

■在这项工作中，我们提出了一种新的方法来构建全脑时空多层功能连接网络（FCN）和四个创新的丰富俱乐部指标。
■时空多层FCN通过将滑动时间窗方法与图论和超图理论相结合，实现了脑网络时空动态特性的高阶表示。提出的四个丰富俱乐部尺度是基于丰富俱乐部节点身份的动态变化，提供了从时间和空间角度对脑网络的拓扑动态特性的参数化描述。在三个独立差异分析实验中验证了所提出的方法：男女性别差异分析，自闭症谱系障碍（ASD）患者的异常分析，和个体差异分析。
■所提出的方法产生的结果与先前的相关研究一致，并揭示了一些创新的发现。例如,特定白质区域的动态拓扑特征有效地反映了个体差异。基底神经节内部功能连接异常的增加可能是ASD患者重复或限制性行为发生的原因。
■所提出的方法为构建全脑时空多层FCN并对其动态拓扑结构进行分析提供了有效的方法。时空多层FCN的动态拓扑特征可能为神经科学中的生理变异和病理异常提供新的见解。
UNASSIGNED: In this work, we propose a novel method for constructing whole-brain spatio-temporal multilayer functional connectivity networks (FCNs) and four innovative rich-club metrics.
UNASSIGNED: Spatio-temporal multilayer FCNs achieve a high-order representation of the spatio-temporal dynamic characteristics of brain networks by combining the sliding time window method with graph theory and hypergraph theory. The four proposed rich-club scales are based on the dynamic changes in rich-club node identity, providing a parameterized description of the topological dynamic characteristics of brain networks from both temporal and spatial perspectives. The proposed method was validated in three independent differential analysis experiments: male-female gender difference analysis, analysis of abnormality in patients with autism spectrum disorders (ASD), and individual difference analysis.
UNASSIGNED: The proposed method yielded results consistent with previous relevant studies and revealed some innovative findings. For instance, the dynamic topological characteristics of specific white matter regions effectively reflected individual differences. The increased abnormality in internal functional connectivity within the basal ganglia may be a contributing factor to the occurrence of repetitive or restrictive behaviors in ASD patients.
UNASSIGNED: The proposed methodology provides an efficacious approach for constructing whole-brain spatio-temporal multilayer FCNs and conducting analysis of their dynamic topological structures. The dynamic topological characteristics of spatio-temporal multilayer FCNs may offer new insights into physiological variations and pathological abnormalities in neuroscience.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)
6 Creative Nursing: History and Future Directions.

创造性护理：历史和未来方向。影响指数 : 暂无
发表时间：May 2024 28
来源期刊：Creat Nurs PMID：38679581

DOI：10.1177/10784535241248067
文章类型： Journal Article

本文追溯了创意护理从1981年作为初级护理时事通讯到目前作为季刊国际的发展历程，跨学科,同行评审，索引，继续培养新手作者的主题期刊，欢迎国际提交，回顾其他期刊不会考虑的文章，并解决许多期刊避免的主题。未来的方向包括多种语言的内容，邀请提交研究方法论文的新作者指南，超越基于P值阈值的统计显著性，要求作者在他们的论文中明确知识翻译的含义，创造性地思考如何利用人工智能进行研究，教育,和实践。
This article traces the development of Creative Nursing from its origin in 1981 as a newsletter about Primary Nursing to its current position as a quarterly international, interdisciplinary, peer-reviewed, indexed, themed journal that continues to nurture novice authors, welcome international submissions, review articles that other journals won\'t consider, and address subjects that many journals avoid. Future directions include content in multiple languages, new author guidelines that invite submissions of research methods papers, moving beyond statistical significance based on p-value thresholds, asking authors to make explicit the implications for knowledge translation in their papers, and thinking creatively about how artificial intelligence can be leveraged for research, education, and practice.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

求助全文
7 On-road evaluation and regulatory recommendations for NOx and particle number emissions of China VI heavy-duty diesel trucks: A case study in Shenzhen.

中国 VI 重型柴油卡车 NOx 和颗粒数排放的道路评估和监管建议：以深圳为例。影响指数 : 10.753
发表时间：Jun 2024 10
来源期刊：Sci Total Environ PMID：38614337

DOI：10.1016/j.scitotenv.2024.172427
文章类型： Journal Article

这项研究分析了21辆中国VI重型柴油卡车（HDDT）的实际NOx和颗粒数（PN）排放。首先使用便携式排放测量系统（PEMS）评估道路排放符合性。只有76.19%，71.43%和61.90%的车辆通过NOx测试，PN测试和两个测试，分别。包括废气再循环（EGR）设备在内的车辆功能的影响，然后评估里程和牵引吨位。结果表明，EGR有助于降低NOx排放因子（EF），同时增加PNEF。较大的里程和牵引吨位对应较高的NOx和PNEF，分别。通过数值比较和统计测试，对操作条件对排放的影响进行了深入分析。结果证明，HDDT在低速或大车辆比功率（VSP）下产生较高的NOxEF，和更高的PNEF在高速或小VSP一般。此外,不合格车辆产生的NOxEF明显高于高速公路上或车速≥40km/h的合格车辆，虽然郊区道路上产生了显著较高的PNEF，高速公路或不合格车辆在具有正VSP的运行模式下。最后研究了车载诊断（OBD）NOx数据的可靠性和准确性。结果显示,43%的测试车辆没有报告可靠的OBD数据。OBDNOx和PEMS测量之间的相关性分析进一步证明瞬时浓度的一致性通常较低。然而,滑动窗口平均浓度显示出更好的相关性，例如，对于大多数车辆，20s窗口平均浓度的Pearson相关系数超过0.85。研究结果为排放管制提供了有价值的见解，例如，更加注重中高速运行，以识别不合格车辆，设定更高的标准以提高OBD数据的质量，并采用窗口平均OBDNOx浓度评价车辆排放性能。
This research analyzed the real-world NOx and particle number (PN) emissions of 21 China VI heavy-duty diesel trucks (HDDTs). On-road emission conformity was first evaluated with portable emission measurement system (PEMS). Only 76.19 %, 71.43 % and 61.90 % of the vehicles passed the NOx test, PN test and both tests, respectively. The impacts of vehicle features including exhaust gas recirculation (EGR) equipment, mileage and tractive tonnage were then assessed. Results demonstrated that EGR helped reducing NOx emission factors (EFs) while increased PN EFs. Larger mileages and tractive tonnages corresponded to higher NOx and PN EFs, respectively. In-depth analyses regarding the influences of operating conditions on emissions were conducted with both numerical comparisons and statistical tests. Results proved that HDDTs generated higher NOx EFs under low speeds or large vehicle specific powers (VSPs), and higher PN EFs under high speeds or small VSPs in general. In addition, unqualified vehicles generated significantly higher NOx EFs than qualified vehicles on freeways or under speed≥40 km/h, while significant higher PN EFs were generated on suburban roads, freeways or under operating modes with positive VSPs by unqualified vehicles. The reliability and accuracy of on-board diagnostic (OBD) NOx data were finally investigated. Results revealed that 43 % of the test vehicles did not report reliable OBD data. Correlation analyses between OBD NOx and PEMS measurements further demonstrated that the consistency of instantaneous concentrations were generally low. However, sliding window averaged concentrations show better correlations, e.g., the Pearson correlation coefficients on 20s-window averaged concentrations exceeded 0.85 for most vehicles. The research results provide valuable insights into emission regulation, e.g., focusing more on medium- to high-speed operations to identify unqualified vehicles, setting higher standards to improve the quality of OBD data, and adopting window averaged OBD NOx concentrations in evaluating vehicle emission performance.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

求助全文
8 Semantically redundant training data removal and deep model classification performance: A study with chest X-rays.

语义冗余训练数据去除和深度模型分类性能：胸部 X射线的研究。影响指数 : 7.422
发表时间：Jul 2024 9
来源期刊：Comput Med Imaging Graph PMID：38608333

DOI：10.1016/j.compmedimag.2024.102379
文章类型： Journal Article

深度学习（DL）已经证明了其从复杂和多维数据中独立学习分层特征的固有能力。一个共同的理解是，它的性能随着训练数据量的增加而扩展。然而,数据还必须表现出多样性，以提高学习能力。在医学成像数据中，语义冗余，即存在类似或重复的信息，可能由于存在多个图像而发生，这些图像对于感兴趣的疾病具有高度相似的呈现。此外，当不加区别地应用于此类数据时，通常使用增强方法来生成DL训练中的多样性可能会限制性能。因此，我们假设语义冗余会降低性能，并限制对看不见的数据的可泛化性，并质疑其对分类器性能的影响，即使是大数据。我们提出了一种基于熵的样本评分方法来识别和去除语义冗余的训练数据，并使用公开的NIH胸部X射线数据集证明，在训练数据的结果信息子集上训练的模型明显优于在完整训练集上训练的模型，在内部（召回：0.7164vs0.6597，p<0.05）和外部测试（召回：0.3185vs0.2589，p<0.05）。我们的发现强调了以信息为导向的训练样本选择的重要性，而不是使用所有可用训练数据的常规做法。
Deep learning (DL) has demonstrated its innate capacity to independently learn hierarchical features from complex and multi-dimensional data. A common understanding is that its performance scales up with the amount of training data. However, the data must also exhibit variety to enable improved learning. In medical imaging data, semantic redundancy, which is the presence of similar or repetitive information, can occur due to the presence of multiple images that have highly similar presentations for the disease of interest. Also, the common use of augmentation methods to generate variety in DL training could limit performance when indiscriminately applied to such data. We hypothesize that semantic redundancy would therefore tend to lower performance and limit generalizability to unseen data and question its impact on classifier performance even with large data. We propose an entropy-based sample scoring approach to identify and remove semantically redundant training data and demonstrate using the publicly available NIH chest X-ray dataset that the model trained on the resulting informative subset of training data significantly outperforms the model trained on the full training set, during both internal (recall: 0.7164 vs 0.6597, p<0.05) and external testing (recall: 0.3185 vs 0.2589, p<0.05). Our findings emphasize the importance of information-oriented training sample selection as opposed to the conventional practice of using all available training data.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)
9 Frequentist, Bayesian Analysis and Complementary Statistical Tools for Geriatric and Rehabilitation Fields: Are Traditional Null-Hypothesis Significance Testing Methods Sufficient?

频率,老年和康复领域的贝叶斯分析和补充统计工具：传统的空假说显著性检验方法是否足够？影响指数 : 暂无
发表时间：2024
来源期刊：Clin Interv Aging PMID：38380229

DOI：10.2147/CIA.S441799
文章类型： Journal Article

空假设显著性检验（NHST）是老年和康复领域的主要统计方法。然而,NHST经常被误解或误用。在这种情况下,临床试验的结果将被视为没有效果的证据，事实上,临床相关问题可能具有“非显著”p值。相反，当观察到组间存在显著差异时,研究结果被认为具有临床相关性.假设p值不是关联或效果存在的唯一指标，应鼓励研究人员报告其他统计分析方法，如贝叶斯分析和补充统计工具以及p值(例如，效果大小,置信区间，最小的临床重要差异，和基于幅度的推断），通过提供更有效，更全面的分析来改善对临床试验结果的解释。然而,对贝叶斯分析和二级统计分析的关注并不意味着NHST不那么重要.只有这个,为了观察真正的干预效果，研究人员应结合NHST或贝叶斯统计分析使用二级统计分析的组合，以揭示老年和康复研究中无法显示的p值（例如，与对照组相比，干预组长寿老年人的握力增加1kg的临床重要性）。本文通过利用贝叶斯和二级统计分析来更好地审查临床试验的结果，为改善康复和老年领域科学数据的解释提供了潜在的见解，其中p值可能不适合单独确定干预措施的疗效。
Null hypothesis significant testing (NHST) is the dominant statistical approach in the geriatric and rehabilitation fields. However, NHST is routinely misunderstood or misused. In this case, the findings from clinical trials would be taken as evidence of no effect, when in fact, a clinically relevant question may have a \"non-significant\" p-value. Conversely, findings are considered clinically relevant when significant differences are observed between groups. To assume that p-value is not an exclusive indicator of an association or the existence of an effect, researchers should be encouraged to report other statistical analysis approaches as Bayesian analysis and complementary statistical tools alongside the p-value (eg, effect size, confidence intervals, minimal clinically important difference, and magnitude-based inference) to improve interpretation of the findings of clinical trials by presenting a more efficient and comprehensive analysis. However, the focus on Bayesian analysis and secondary statistical analyses does not mean that NHST is less important. Only that, to observe a real intervention effect, researchers should use a combination of secondary statistical analyses in conjunction with NHST or Bayesian statistical analysis to reveal what p-values cannot show in the geriatric and rehabilitation studies (eg, the clinical importance of 1kg increase in handgrip strength in the intervention group of long-lived older adults compared to a control group). This paper provides potential insights for improving the interpretation of scientific data in rehabilitation and geriatric fields by utilizing Bayesian and secondary statistical analyses to better scrutinize the results of clinical trials where a p-value alone may not be appropriate to determine the efficacy of an intervention.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)
10 Interpreting the Current Literature on Outcomes of Robotic-Assisted Versus Conventional Total Knee Arthroplasty Using Fragility Analysis: A Systematic Review and Cross-Sectional Study of Randomized Controlled Trials.

使用脆性分析解释机器人辅助与常规全膝关节置换术的结果的当前文献：随机对照试验的系统评价和横断面研究。影响指数 : 4.435
发表时间：Jul 2024 1
来源期刊：J Arthroplasty PMID：38309638

DOI：10.1016/j.arth.2024.01.044
文章类型： Journal Article

背景：脆弱性分析是一种根据统计结果的稳定性进一步表征结果的方法。这项研究评估了最近的随机对照试验（RCT）的统计脆弱性，该试验评估了机器人辅助与常规全膝关节置换术（RA-TKA与C-TKA）。
方法：我们向PubMed查询了比较对齐的RCT，函数,RA-TKA和C-TKA之间的结果。脆弱性指数(FI)和反向脆弱性指数(RFI)(统称，计算“FI”）作为改变统计显著性所需的结果逆转次数。通过将FI除以该结果事件的样本大小来计算脆性商(FQ)。计算所有结果以及每个单独结果的平均FI和FQ。根据结局事件类型和统计学意义进行分分析以评估FI和FQ，以及随访和发表年份的研究损失。
结果：总体中位数FI为3.0（四分位距，[IQR]1.0至6.3），中位数RFI为3.0（IQR2.0至4.0）。总体中位数FQ为0.027（IQR0.012至0.050）。在评估的38项结果中，有23项随访损失大于FI。
结论：少量的替代结果通常足以逆转RA-TKA与C-TKA中评估二分结果的RCT结果的统计学意义。我们建议报告FI和FQ以及P值，以提高RCT结果的可解释性。
BACKGROUND: Fragility analysis is a method of further characterizing outcomes in terms of the stability of statistical findings. This study assesses the statistical fragility of recent randomized controlled trials (RCTs) evaluating robotic-assisted versus conventional total knee arthroplasty (RA-TKA versus C-TKA).
METHODS: We queried PubMed for RCTs comparing alignment, function, and outcomes between RA-TKA and C-TKA. Fragility index (FI) and reverse fragility index (RFI) (collectively, \"FI\") were calculated for dichotomous outcomes as the number of outcome reversals needed to change statistical significance. Fragility quotient (FQ) was calculated by dividing the FI by the sample size for that outcome event. Median FI and FQ were calculated for all outcomes collectively as well as for each individual outcome. Subanalyses were performed to assess FI and FQ based on outcome event type and statistical significance, as well as study loss to follow-up and year of publication.
RESULTS: The overall median FI was 3.0 (interquartile range, [IQR] 1.0 to 6.3) and the median reverse fragility index was 3.0 (IQR 2.0 to 4.0). The overall median FQ was 0.027 (IQR 0.012 to 0.050). Loss to follow-up was greater than FI for 23 of the 38 outcomes assessed.
CONCLUSIONS: A small number of alternative outcomes is often enough to reverse the statistical significance of findings in RCTs evaluating dichotomous outcomes in RA-TKA versus C-TKA. We recommend reporting FI and FQ alongside P values to improve the interpretability of RCT results.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

求助全文

statistical significance 关注

1 About statistical significance, and the lack thereof.

2 For a proper use of frequentist inferential statistics in public health.

3 Statistical significance, clinical importance and effect sizes: Enhancing understanding of a study's results.

4 Investigating the Performance of Frequentist and Bayesian Techniques in Genomic Evaluation.

5 Rich-club organization of whole-brain spatio-temporal multilayer functional connectivity networks.

6 Creative Nursing: History and Future Directions.

7 On-road evaluation and regulatory recommendations for NOx and particle number emissions of China VI heavy-duty diesel trucks: A case study in Shenzhen.

8 Semantically redundant training data removal and deep model classification performance: A study with chest X-rays.

9 Frequentist, Bayesian Analysis and Complementary Statistical Tools for Geriatric and Rehabilitation Fields: Are Traditional Null-Hypothesis Significance Testing Methods Sufficient?

10 Interpreting the Current Literature on Outcomes of Robotic-Assisted Versus Conventional Total Knee Arthroplasty Using Fragility Analysis: A Systematic Review and Cross-Sectional Study of Randomized Controlled Trials.