We believe Sir R. A. Fisher was the greatest statistician ever because he was both a statistician and a scientist--a rare combination. We are therefore puzzled as to how Dr. Jones construed our comments being anti-Fisherian. Perhaps the misunderstanding arises from radical editing of the letter that we originally wrote to American Scientist. Hence, we offer the following elaboration.
In “Of Beauty, Sex and Power” (July-August 2009), Andrew Gelman and David Weakliem point out some abuses of statistics. Inadvertently, by universally equating statistical significance at the 0.05 level to real-life significance, the article also helps perpetuate perhaps the biggest abuse of statistics. A universal threshold such as 0.05 cannot possibly be applicable to all phenomena and all contexts.
Journals routinely require investigators to report whether or not the findings are statistically significant at the 0.05 level. In interpreting the results, the adjective “statistically” and the qualifier 0.05 are dropped, and the findings are reported to be “significant” or “nonsignificant.” The sequence of dilution continues by describing “non-significant” findings as “chance findings,” and “significant” findings as “definitive” findings.
It is irrational that an effect with P=0.051 is considered due to chance whereas an effect with P=0.049 is considered real. Both imply chance; only the degree is different. How much chance is acceptable is a risk-benefit issue and a contextual issue but not a statistical issue. To be fully informative, studies should report calculated p-value which is the probability calculated from actual data that the effect as large as actually observed, or larger, could be observed if in fact there was no effect. The risk-benefit and the consequence should then decide whether or not the effect has real-life significance.
Perhaps a couple of clinical scenarios would help drive the point home. Suppose a drug targeted at a rare and incurable cancer has shown positive results in a small clinical trial at P=0.07. A physician would probably prescribe the drug for a patient facing certain death from this cancer in spite of it being not effective at the 0.05 level. The potential benefit outweighs the risk. Conversely, suppose a drug given to living kidney donor during surgery reduced the risk of rejection by the recipient. But, in a large clinical trial, a few donors developed life-threatening complication, the effect being significant at P=0.07. Can this toxicity be ignored because P>0.05? Probably not, because the risk involves healthy volunteers and outweighs the benefit.
To our knowledge, the threshold of 0.05 to separate “real effect” from “chance” was suggested by R.A. Fisher for use in agricultural experiments, who perhaps felt that a one in twenty (5 percent) chance was reasonable to take in concluding that, for example, a fertilizer was effective when in fact it was not. To equate the consequences of erroneously concluding that a fertilizer was effective with the consequences of making a clinical decision regarding a patient with an incurable disease defies logic.
Boring in 1919 told us mathematical (statistical) and scientific (real-life) significance were two distinct phenomena , a point also made distinctively in the International Encyclopedia of Statistics . How long will it take us to hear Boring!
Hari Dayal, Ph.D.
Adjunct Professor, Health Management and Policy, School of Public Health, U. North Texas Health Science Center, Fort Worth, TX
Alok Kalia, M.D.
Dept. of Pediatrics, U. Texas Medical Branch, Galveston (Retired, part-time)
1. Boring EG. Mathematical vs. Scientific Significance. Psychological Bulletin. Vol. 15, 1919: 335-338
2. Kruskal WH, Tanur JM (editors). International Encyclopedia of Statistics (Vol. 2). The Free Press, New York,1978, page 946.
posted by Hari Dayal
December 14, 2009
About once a month at Sigma Xi headquarters, we liven up the lunch hour with an American Scientist Pizza Lunch talk. In these informal lectures, scientists describe new research to nonscientists. The series is light on jargon but heavy on solid science. Each Pizza Lunch offers an in-depth look at its subject, whether it's bedbugs or the smart grid. Click below to read about and download these talks -- and to subscribe!
JSTOR, the online academic archive, now contains complete back issues of American Scientist from its inception in 1913 (as Sigma Xi Quarterly) through 2005.
The table of contents for each issue is freely available to all users; those with institutional access can read each complete issue.
View the full collection here.