null hypothesis

  • 文章类型: Journal Article






  • 文章类型: Journal Article
    As widely noted in the literature and by international bodies such as the American Statistical Association, severe misinterpretations of P-values, confidence intervals, and statistical significance are sadly common in public health. This scenario poses serious risks concerning terminal decisions such as the approval or rejection of therapies. Cognitive distortions about statistics likely stem from poor teaching in schools and universities, overly simplified interpretations, and - as we suggest - the reckless use of calculation software with predefined standardized procedures. In light of this, we present a framework to recalibrate the role of frequentist-inferential statistics within clinical and epidemiological research. In particular, we stress that statistics is only a set of rules and numbers that make sense only when properly placed within a well-defined scientific context beforehand. Practical examples are discussed for educational purposes. Alongside this, we propose some tools to better evaluate statistical outcomes, such as multiple compatibility or surprisal intervals or tuples of various point hypotheses. Lastly, we emphasize that every conclusion must be informed by different kinds of scientific evidence (e.g., biochemical, clinical, statistical, etc.) and must be based on a careful examination of costs, risks, and benefits.






  • 文章类型: Journal Article
    In several large-scale replication projects, statistically non-significant results in both the original and the replication study have been interpreted as a \'replication success.\' Here, we discuss the logical problems with this approach: Non-significance in both studies does not ensure that the studies provide evidence for the absence of an effect and \'replication success\' can virtually always be achieved if the sample sizes are small enough. In addition, the relevant error rates are not controlled. We show how methods, such as equivalence testing and Bayes factors, can be used to adequately quantify the evidence for the absence of an effect and how they can be applied in the replication setting. Using data from the Reproducibility Project: Cancer Biology, the Experimental Philosophy Replicability Project, and the Reproducibility Project: Psychology we illustrate that many original and replication studies with \'null results\' are in fact inconclusive. We conclude that it is important to also replicate studies with statistically non-significant results, but that they should be designed, analyzed, and interpreted appropriately.






  • 文章类型: Journal Article
    A paradigm shift away from null hypothesis significance testing seems in progress. Based on simulations, we illustrate some of the underlying motivations. First, p-values vary strongly from study to study, hence dichotomous inference using significance thresholds is usually unjustified. Second, \'statistically significant\' results have overestimated effect sizes, a bias declining with increasing statistical power. Third, \'statistically non-significant\' results have underestimated effect sizes, and this bias gets stronger with higher statistical power. Fourth, the tested statistical hypotheses usually lack biological justification and are often uninformative. Despite these problems, a screen of 48 papers from the 2020 volume of the Journal of Evolutionary Biology exemplifies that significance testing is still used almost universally in evolutionary biology. All screened studies tested default null hypotheses of zero effect with the default significance threshold of p = 0.05, none presented a pre-specified alternative hypothesis, pre-study power calculation and the probability of \'false negatives\' (beta error rate). The results sections of the papers presented 49 significance tests on average (median 23, range 0-390). Of 41 studies that contained verbal descriptions of a \'statistically non-significant\' result, 26 (63%) falsely claimed the absence of an effect. We conclude that studies in ecology and evolutionary biology are mostly exploratory and descriptive. We should thus shift from claiming to \'test\' specific hypotheses statistically to describing and discussing many hypotheses (possible true effect sizes) that are most compatible with our data, given our statistical model. We already have the means for doing so, because we routinely present compatibility (\'confidence\') intervals covering these hypotheses.






  • 文章类型: Journal Article
    In the United States, the majority of physicians have been sued and those who have not, will be. Defendants share the notion that the lawsuit is totally fallacious. To be fallacious, the outcome of a medical intervention must be an unpreventable random maloccurrence. This is the only alternative to a medical error. The conflict over outcomes that are random and outcomes that are medical errors results in 46,000 malpractice suits every year in the USA. The burden of proof is a preponderance of evidence, but this is insufficient to do more than just infer, not prove, a relationship between the medical intervention and the outcome. Plaintiffs, generally, prove a malpractice case using inductive reasoning. Inductive reasoning leaves much to intuition. They use inductive reasoning because, by definition, preponderance of evidence, also, leaves much to intuition. Deductive reasoning is objective and there is no place for intuition. With deductive reasoning, the burden of proof is now sufficient to distinguish whether or not the cause relates to the effect with 95% confidence. A model for deductive reasoning in malpractice which is completely consistent with the scientific method is presented. This should and would derail frivolous lawsuits.






  • 文章类型: Journal Article
    Null hypothesis significance testing (NHST) and p-values are widespread in the cardiac surgical literature but are frequently misunderstood and misused. The purpose of the review is to discuss major disadvantages of p-values and suggest alternatives. We describe diagnostic tests, the prosecutor\'s fallacy in the courtroom, and NHST, which involve inter-related conditional probabilities, to help clarify the meaning of p-values, and discuss the enormous sampling variability, or unreliability, of p-values. Finally, we use a cardiac surgical database and simulations to explore further issues involving p-values. In clinical studies, p-values provide a poor summary of the observed treatment effect, whereas the three-number summary provided by effect estimates and confidence intervals is more informative and minimizes over-interpretation of a \"significant\" result. p-values are an unreliable measure of the strength of evidence; if used at all they give only, at best, a very rough guide to decision making. Researchers should adopt Open Science practices to improve the trustworthiness of research and, where possible, use estimation (three-number summaries) or other better techniques.






  • 文章类型: Editorial






  • 文章类型: Journal Article
    Over the last decade, gene set analysis has become the first choice for gaining insights into underlying complex biology of diseases through gene expression and gene association studies. It also reduces the complexity of statistical analysis and enhances the explanatory power of the obtained results. Although gene set analysis approaches are extensively used in gene expression and genome wide association data analysis, the statistical structure and steps common to these approaches have not yet been comprehensively discussed, which limits their utility. In this article, we provide a comprehensive overview, statistical structure and steps of gene set analysis approaches used for microarrays, RNA-sequencing and genome wide association data analysis. Further, we also classify the gene set analysis approaches and tools by the type of genomic study, null hypothesis, sampling model and nature of the test statistic, etc. Rather than reviewing the gene set analysis approaches individually, we provide the generation-wise evolution of such approaches for microarrays, RNA-sequencing and genome wide association studies and discuss their relative merits and limitations. Here, we identify the key biological and statistical challenges in current gene set analysis, which will be addressed by statisticians and biologists collectively in order to develop the next generation of gene set analysis approaches. Further, this study will serve as a catalog and provide guidelines to genome researchers and experimental biologists for choosing the proper gene set analysis approach based on several factors.







  • 文章类型: Journal Article
    We review evidence for Macphail\'s (1982, 1985, 1987) Null Hypothesis, that nonhumans animals do not differ either qualitatively or quantitatively in their cognitive capacities. Our review supports the Null Hypothesis in so much as there are no qualitative differences among nonhuman vertebrate animals, and any observed differences along the qualitative dimension can be attributed to failures to account for contextual variables. We argue species do differ quantitatively, however, and that the main difference in \"intelligence\" among animals lies in the degree to which one must account for contextual variables.







  • 文章类型: Journal Article
    Macphail famously criticized two foundational assumptions that underlie the evolutionary approach to comparative psychology: that there are differences in intelligence across species, and that intelligent behavior in animals is based on more than associative learning. Here, we provide evidence from recent work in avian cognition that supports both these assumptions: intelligence across species varies, and animals can perform intelligent behaviors that are not guided solely by associative learning mechanisms. Finally, we reflect on the limitations of comparative psychology that led to Macphail\'s claims and suggest strategies researchers can use to make more advances in the field.






