Australasian Science: Australia's authority on science since 1938

How Significant Is P?

By Geoff Cumming

Questions over the significance of P values requires the adoption of a new and transparent approach to validating research data.

Tests of statistical significance provide the basis for drawing conclusions in numerous disciplines. Obtain the magic p < .05, or even p < .01, and you can declare the result “statistically significant” or even “highly significant”.

Every decent textbook explains that a “significant” result may be tiny and worthless, but the power of everyday language, and every researcher’s ache to find results that matter, delude us into taking “statistically significant” as very, very close to “true” and “important”.

For more than half a century, leading scholars have explained the terrible flaws of statistical significance testing: it does not tell us what we want to know, it’s widely misunderstood and misused, and it’s damaging to research progress. And yet it persists.

Then came reports that widely accepted results might not be real. From cancer research to social psychology, some published findings simply could not be replicated. Many established results appearing in excellent journals might be false. Disaster!

In 2005, epidemiologist John Ioannidis joined the dots by famously explaining “why most published research findings are false” (tinyurl.com/c94hl6). He fingered the universal imperative to obtain statistical significance, which has three bad effects.

First, needing to achieve p < .05 explains selective publication: results with p > .05 don’t appear in print and simply die, so journals present a biased selection of all research conducted.

Second, researchers feel compelled to force the data analysis – not fraud, but a small tweak here and some selection there – to squeeze under the .05 bar.

Third, once a result achieves .05 and is published, it’s accepted as true. Hence there is little incentive to replicate – what editor will devote precious journal space to a ho-hum “I found it too” result?

So we need better statistics in place of p values and statistical significance. We also need full availability of all results of research conducted to a reasonable standard – not just p < .05 results. In addition, researchers need to analyse their data openly and truly without selection or tweaking.

The best way to guarantee full publication with no tweaking during analysis is for a researcher to declare in advance the full details of the proposed experiment and data analysis – a demanding requirement! Finally, we need to encourage replication of published findings, and publication of those replications – whatever they find.

That’s a wish list that differs dramatically from standard practice, yet it’s necessary if future published research is to be trustworthy. I’m delighted to report that Psychological Science, the world’s top outlet for empirical research from across psychological science, is boldly tackling the full wish list. Erich Eich, the editor, has explained the new policies (tinyurl.com/m35wrah) and introduced a tutorial article (tinyurl.com/m7ugcxt) designed to support them.

New submission guidelines (tinyurl.com/kh63utb) enable authors to earn up to three badges. The Pre­registered badge confirms that all details of the research were declared in advance. The Open Materials badge declares that full information about procedures and materials is available online so that other researchers can replicate the study. The Open Data badge declares that the full data are available.

Possibly most dramatically, Psychological Science is embracing “the new statistics” (thenewstatistics.com), meaning researchers are strongly encouraged not to use statistical significance testing. Instead they should seek better methods, perhaps estimation based on effect sizes and confidence intervals.

These are bold and exciting developments. Many will be watching to see how well authors comply and how thoroughly the editors vet submitted manuscripts.

The reward could be substantially improved research in psychological science.

Geoff Cumming is an Emeritus Professor at Latrobe University’s School of Psychological Science and author of Understanding The New Statistics: Effect Sizes, Confidence Intervals, and Meta-Analysis.