It might seem surprising that a cephalopod could make so many predictions from mere random guesses. But as statistician David Hand explains in his book “The Improbability Principle”, if enough people ask enough animals to predict the outcomes of enough sporting events, eventually it is likely some will guess correctly a few times in a row.
Something similar happens when thousands of scientists go looking for weird, headline-grabbing results — a common practice in psychology. For a while, it looked like you could make yourself happier by holding a pencil in your teeth, become more empathetic by spending three minutes reading classic literature, or gain power by assuming special “power poses”. All have gone the way of poor, discredited Paul the Octopus.
The good news is that tools to separate real results from random noise exist. The bad news is that many researchers in afflicted fields never learned to use them correctly — a conclusion echoed by reformers from within, and reflected in a statement issued last year by the American Statistical Association.
Statistical analysis falls into two major schools — frequentist and Bayesian statistics. The frequentist school of thought hinges on the idea that the probability of something happening corresponds to the number of times it would happen given many chances. Roll a die often enough and five will come up exactly one-sixth of the time.
Back in 1865, mathematicians Benjamin and Charles Peirce, a father-and-son team, used the concept to help settle a dispute involving Hetty Green, who would later become known as the richest woman in America. The story, told in detail by journalist Louis Menand in his book “The Metaphysical Club”, starts with the mathematicians getting called as expert witnesses to determine whether Green (then Hetty Robinson) had forged her aunt’s signature on an alternative will that would have bequeathed her a fortune of $2 million.
The signature in question was perfectly identical to a signature on the original will, suggesting that Robinson had traced it. Most authentic signatures vary a bit. What were the odds these two signatures would be identical by chance?
The Peirces noted that the signature had 30 down strokes. They found 44 other examples of the aunt’s signature, measured down strokes, and calculated that a given down stroke matched across two signatures 5 percent of the time. They calculated odds of one in 68 that three down strokes would match in two signatures, 1 in 144 that four would match, and odds of one in trillions that all 30 would match.
This calculation bears some resemblance to what scientists do to determine what they call statistical significance — which is expressed as a p-value. The technique was invented to help researchers separate real results from noise by giving them a sense of whether they should be surprised enough by their data to take a closer look.
But there, psychologists and medical researchers usually use an arbitrary cutoff point of .05 (1 in 20) to define what’s statistically significant — a standard far less stringent than the one-in-trillions calculated by the Peirces. This porous filter was originally intended to flag preliminary results that deserved a second look — not as a proxy for truth.
The problem with p-values goes beyond that. Gerd Gigerenzer, a psychologist and longtime science critic at the Max Planck Institute for Human Development, points to a survey published in 2002, which indicated most professors of psychology don’t know what p-values represent. They think they know, but they don’t.
And because they don’t understand it, they routinely calculate it incorrectly, he said, allowing the publication of lots of high-profile noise under a veneer of statistical rigour. In a 2004 paper titled Mindless Statistics, Gigerenzer illustrated the crux of the problem with an anecdote from the writings of physicist Richard Feynman.
If researchers comb through their data fishing for weird things, that’s fine, but to calculate their statistical significance requires a separate experiment. Otherwise they’ll end up with the same problem that afflicted one of Paul’s successors, a koala named Oobi-Ooobi, who was fired last year after his sports prediction powers suddenly — and not so mysteriously — disappeared.
One subscription. Two world-class reads.
Already subscribed? Log in
Subscribe to read the full story →
Smart Quarterly
₹900
3 Months
₹300/Month
Smart Essential
₹2,700
1 Year
₹225/Month
Super Saver
₹3,900
2 Years
₹162/Month
Renews automatically, cancel anytime
Here’s what’s included in our digital subscription plans
Exclusive premium stories online
Over 30 premium stories daily, handpicked by our editors


Complimentary Access to The New York Times
News, Games, Cooking, Audio, Wirecutter & The Athletic
Business Standard Epaper
Digital replica of our daily newspaper — with options to read, save, and share


Curated Newsletters
Insights on markets, finance, politics, tech, and more delivered to your inbox
Market Analysis & Investment Insights
In-depth market analysis & insights with access to The Smart Investor


Archives
Repository of articles and publications dating back to 1997
Ad-free Reading
Uninterrupted reading experience with no advertisements


Seamless Access Across All Devices
Access Business Standard across devices — mobile, tablet, or PC, via web or app
)