LETTERS TO THE EDITORS
Statistical or Actual Significance?
To the Editors:
In “Of Beauty, Sex and Power” (July–August 2009), Andrew Gelman and David Weakliem define statistical significance as being at least two standard errors away from the null value. That essentially says that an estimated effect is statistically significant only if significant at the 0.05 level, a definition that inadvertently perpetuates an abuse of statistics.
It is irrational that an effect with P = 0.051 is considered due to chance whereas an effect with P = 0.049 is considered real. Both imply chance; only the degree is different. How much chance is acceptable is a risk-benefit issue and a contextual issue but not a statistical issue.
Suppose a drug targeted at a rare and incurable cancer has shown positive results in a small clinical trial at p=0.07 level. A physician would probably prescribe the drug for a patient facing certain death from this cancer in spite of it not being effective at the 0.05 level. The potential benefit outweighs the risk. [This paragraph did not appear in the print edition of the magazine.]
Conversely, suppose a drug given to living kidney donors during surgery reduced the risk of rejection by the recipient. But, in a large clinical trial, a few donors developed a life-threatening complication, the effect being significant at P = 0.07. Can this toxicity be ignored because P > 0.05? Probably not, because the risk involves healthy volunteers and outweighs the benefit.
A universal threshold cannot be applicable to all phenomena or all contexts. R.A. Fisher proposed the 0.05 standard for use in agricultural experiments. To equate agricultural experiments with decisions regarding patients defies logic.
Hari Dayal and Alok Kalia
Fort Worth, TX and Galveston, TX
Dr. Gelman responds:
I agree with Drs. Dayal and Kalia that any statistical-significance threshold is arbitrary and should be adapted to circumstances. With regard to the sex-ratio research in question, our point was that, had the results not been reported as statistically significant at the standard 5 percent level, there is no way they would have been published. But the bulk of our article discusses the difficulties of statistical inference for small effects in settings where, due to sample size limitations, “statistically significant” findings will almost certainly overstate the magnitude of the effects being studied.