P values and P "hacking"

wjm11

Senior Member
Joined
Nov 13, 2004
Messages
1,417
I found the following article interesting:

http://www.nature.com/news/scientific-method-statistical-errors-1.14700

Some excerpts:

"The results were “plain as day”... ...the data provided clear support.” The P value, a common index for the strength of evidence, was 0.01 — usually interpreted as 'very significant'.

"These are sticky concepts, but some statisticians have tried to provide general rule-of-thumb conversions (see 'Probable cause'). According to one widely used calculation5, a P value of 0.01 corresponds to a false-alarm probability of at least 11%, depending on the underlying probability that there is a true effect; a P value of 0.05 raises that chance to at least 29%. So Motyl's finding had a greater than one in ten chance of being a false alarm. Likewise, the probability of replicating his original result was not 99%, as most would assume, but something closer to 73% — or only 50%, if he wanted another 'very significant' result6, 7. In other words, his inability to replicate the result was about as surprising as if he had called heads on a coin toss and it had come up tails.

"Perhaps the worst fallacy is the kind of self-deception for which psychologist Uri Simonsohn of the University of Pennsylvania and his colleagues have popularized the term P-hacking; it is also known as data-dredging, snooping, fishing, significance-chasing and double-dipping. “P-hacking,” says Simonsohn, “is trying multiple things until you get the desired result”

"Simonsohn's simulations have shown9 that changes in a few data-analysis decisions can increase the false-positive rate in a single study to 60%. P-hacking is especially likely, he says, in today's environment of studies that chase small effects hidden in noisy data. It is tough to pin down how widespread the problem is, but Simonsohn has the sense that it is serious. In an analysis10, he found evidence that many published psychology papers report P values that cluster suspiciously around 0.05, just as would be expected if researchers fished for significant P values until they found one.
 
Top