Tests of statistical significance provide the basis for drawing conclusions in numerous disciplines. Obtain the magic p < .05, or even p < .01, and you can declare the result “statistically significant” or even “highly significant”.
Every decent textbook explains that a “significant” result may be tiny and worthless, but the power of everyday language, and every researcher’s ache to find results that matter, delude us into taking “statistically significant” as very, very close to “true” and “important”.
For more than half a century, leading scholars have explained the terrible flaws of statistical significance testing: it does not tell us what we want to know, it’s widely misunderstood and misused, and it’s damaging to research progress. And yet it persists.
Then came reports that widely accepted results might not be real. From cancer research to social psychology, some published findings simply could not be replicated. Many established results appearing in excellent journals might be false. Disaster!
In 2005, epidemiologist John Ioannidis joined the dots by famously explaining “why most published research findings are false” (tinyurl.com/c94hl6). He fingered the universal imperative to obtain statistical significance, which has three bad effects.
First, needing to achieve p < .05 explains selective publication: results with...