The Survival of the Fittists
Understanding the role of replication in research is crucial for the interpretation of scientific advances
Shrinking Effects in Science
Alas, the shrinking size of scientific results is not a phenomenon confined to scientific exotica like ESP. It manifests itself everywhere and often leads the general public to wonder how much of the scientific literature can be believed. In 2005 John Ioannidis, a prominent epidemiologist, published a paper provocatively titled “Why Most Published Research Findings Are False.” Ioannidis provides a thoughtful explanation of why research results are often not as dramatic as they were first thought to be. He then elaborates the characteristics of studies that control the extent to which their results shrink upon replication.
None of Ioannidis’ explanations came as a surprise to those familiar with statistics, which is, after all, the science of uncertainty. Larger studies with bigger sample sizes have more stable results; studies in which there are great financial consequences may more often yield biases; when study designs are flexible, results vary more. The publication policies of scientific journals can also be a prominent source of bias.
Let me illustrate with a hypothetical example. Assume that we are doing a trial for some sort of medical treatment. Furthermore, suppose that although the treatment has no effect (perhaps it is the medical equivalent of an ESP study) it seems on its face to be a really good idea. To make this more concrete, imagine that modern scientific methods were available and trusted in the 19th century, and someone decided to use them to test the efficacy of using leeches to draw blood (which was once believed to balance the bodily humors and thence cure fevers). If a single study was done, the odds are it would find no effect. If, over a long period of time, many such studies were done, we might find that most would find no effect, a fair number would show a small negative effect and an equal number a small positive effect—all quite by chance. But chance being what it is, if enough studies were done, a few would show a substantial positive effect—and be balanced by a similar number that showed a complementary negative effect (see the figure on page 360).
Of course, if we were privy to such a big-picture summary, we could see immediately that the treatment has no efficacy and is merely showing random variation. But such a comprehensive view has not been possible in the past (although there is currently a push to build a database that would produce such plots for all treatments being studied—the Cochrane Collaboration). Instead what happens is that researchers who do a study and find no significant effect cannot publish it; editors want to save the scarce room in their journals for research that finds something. Thus studies with null, or small, estimates of treatment effects are either thrown away or placed in a metaphorical file drawer.
But if someone gets lucky and does a study whose results, quite by chance, fall further out in the tail of the normal curve, they let out a whoop of success, write it up and get it published in some A-list journal—perhaps the Journal of the American Medical Association, perhaps the New England Journal of Medicine. We’ll call this the alpha study. A publication in such a prestigious journal garners an increase in the visibility of both the research and the researcher—a win-win.
The attention generated by such a publication naturally leads to attempts to replicate; sometimes these replication studies turn out to have been done before the alpha study, lending support to the hypothesis that the alpha study might be anomalous. Typically these studies do not show an effect as large as that seen in the alpha study. Moreover, the replication studies are not published in the A-list journals, for they are not pathbreaking. They appear in more minor outlets—if they are accepted for publication at all.
So a pattern emerges. A startling and wonderful result appears in the most prestigious of journals, and news of the finding is trumpeted in the media. Subsequently, independent studies also appear, but few are seen by a significant number of readers, and fewer still are picked up by the media to diminish the impression of a breakthrough generated by the alpha study. Sometimes, though, news of diminished efficacy percolates out to the field and perhaps even the public at large. Then we start to worry, “Does any treatment really work?”
One version of this effect, delineated in a 1995 paper by Geneviève Grégoire and her colleagues at the Hôtel-Dieu de Montréal in Quebec, has come to be called the Tower of Babel bias. The authors considered meta-analyses published in eight English-language medical journals over a period of two years. The advantage of a meta-analysis is that it combines the findings of many other studies in an effort to establish a more rigorous conclusion based on the totality of what has been done. More than just a research review, it allows each study to be weighted proportional to its validity. Grégoire and her colleagues found that a majority of the analyses excluded some studies based on language of publication, and that the analyses’ results might have been altered had they included studies published in languages other than English. More generally, it is almost a truism by now that studies whose results either do not achieve statistical significance or show only a small effect are published in local journals or not at all. Thus international estimates of treatment effects tend to have a positive bias.