People Cause Replication Problems, Not Machine Learning

This branch of artificial intelligence can be a reliable tool for research, if scientists use rigorous methodology.

April 5, 2019

Macroscope Communications Computer Ethics Technology

The headlines were broad and alarming.

Machine learning, an important subfield of artificial intelligence that has garnered increasing attention over the past couple of years, is threatening science itself.

Ad Right

All these media stories referred to a talk given at the recent meeting of the American Association for the Advancement of Science (AAAS) by Genevera Allen of Rice University, on “Machine Learning: The View from Statistics.”

According to the breathless news coverage, Allen warned the scientific community that machine-learning methods threaten the very integrity of scientific research, that they contribute to a proliferation of nonreproducible and unreliable results, and that they are overall causing a crisis in science itself.

These dire reports are not even remotely the case, nor do they describe what Allen actually spoke about.


Photo by Jeff Fitlow/Rice University.

Rather, she described several examples of published peer-reviewed research that used clustering (a kind of machine learning) to determine subtypes for certain diseases, showing that the research was not reproducible and that this fact would have been evident had the original researchers tested whether their clustering was stable under perturbations of the data.

What Allen reported is indeed a problem, but is not, under any stretch of the imagination, a crisis caused by machine learning per se. The current prominence of the artificial intelligence field (and, to be sure, the high prevalence of hype around it) makes machine learning an alluring target, but it is the wrong one. The true cause of the problem is deficient reasoning and methodology. The researchers misused the tools of machine learning, and failed to properly control their experiments and analyses.

This sort of thing is nothing new or unusual. The scientific literature from its very origins is replete with examples of statistical and other fallacies that have ensnared the most perspicacious of researchers, and full of influential studies that turned out to be false.

No analytic method—whether it uses machine learning, statistical hypothesis testing, or magic beans—can produce reliable results without proper control factors during an experiment.

As one example, in the early 1930s, Beth Wellman and her colleagues at the University of Iowa published influential studies of the effect of preschool education on IQ, finding that it had a strong positive effect, although the effect was lower for those starting with high IQs. These studies and others that followed have had a powerful and lasting effect on government policy to this day by promoting the idea that early childhood education can significantly improve IQ and hence individual outcomes. However, as early as 1940, methodological critiques of these studies appeared, such as ones by Florence Goodenough and Katharine Maurer, who showed that the Iowa results could be explained by regression to the mean and other statistical artifacts, and so they proved nothing about how preschool could affect IQ.

But no one would claim that this example, even when put together with a great many similar ones, implies that “statistics is causing a crisis in science!”

The fact is that no analytic method—whether it uses machine learning, statistical hypothesis testing, or magic beans—can produce reliable results without proper control factors during an experiment. And such controls take a great deal of thought and effort to apply effectively.

It is a poor worker who blames their tools for failure.

As physicist Richard Feynman said in his 1974 commencement address at the California Institute of Technology, “The first principle [of science] is that you must not fool yourself, and you are the easiest person to fool.” Indeed, the history of science is, in a very real sense, the history of developing better and better methods to keep us from fooling ourselves.

It is a poor worker who blames their tools for failure. This inclination to blame tools such as machine learning for the replication crisis is the other face of the endless media hype that promotes tools such as machine learning (or deep learning, or neuroimaging, or genomic analysis) as a magic solution to all our scientific difficulties.

Among the most powerful heuristics in human thinking is the search for simple and elegant causes for complex phenomena. This heuristic has led to powerful and general scientific theories such as Newtonian mechanics and the germ theory of disease. However, as with all cognitive biases, it also easily leads us astray. Many desperately wish to believe that there must be simple causes for scientific success or failure, so we hang our collective hats on this tool or that one—whether they are novel such as machine learning, or venerable such as null-hypothesis significance testing—and don’t sufficiently attend to the long, hard slog of making rigorous and airtight arguments, checking our methods and results, and accounting for all the myriad ways that we could be fooling ourselves. And then, when these “magical” tools inevitably reveal their feet of clay, the declarations begin that the tools have caused a crisis of bad science.

Not so.

There is simply no substitute for the careful and painstaking work of searching for and then eliminating our own errors. As Euclid famously told King Ptolemy, “There is no royal road to geometry.” Nor is there a royal road to reliable scientific results—not with machine learning, and not with anything else.



  • Navigation Menu
  • Help
  • My AmSci
  • Select Options (not present on all pages)

Click "American Scientist" to access home page