Logo IMG
HOME > PAST ISSUE > Article Detail


Freakonomics: What Went Wrong?

Examination of a very popular popular-statistics series reveals avoidable errors

Andrew Gelman, Kaiser Fung

On a Case-by-case Basis

In our analysis of the Freakonomics approach, we encountered a range of avoidable mistakes, from back-of-the-envelope analyses gone wrong to unexamined assumptions to an uncritical reliance on the work of Levitt’s friends and colleagues. This turns accessibility on its head: Readers must work to discern which conclusions are fully quantitative, which are somewhat data driven and which are purely speculative.

The case of the missing girls: Monica Das Gupta is a World Bank researcher who, along with others in her field, has attributed the abnormally high ratio of boy-to-girl births in Asian countries to a preference for sons, which manifests in selective abortion and, possibly, infanticide. As a graduate student in economics, Emily Oster (now a professor at the University of Chicago) attacked this conventional wisdom. In an essay in Slate, Dubner and Levitt praised Oster and her study, which was published in the Journal of Political Economy during Levitt’s tenure as editor:

[Oster] measured the incidence of hepatitis B in the populations of China, India, Pakistan, Egypt, Bangladesh, and other countries where mothers gave birth to an unnaturally high number of boys. Sure enough, the regions with the most hepatitis B were the regions with the most “missing” women. Except the women weren’t really missing at all, for they had never been born.

Oster’s work stirred debate for a few years in the epidemiological literature, but eventually she admitted that the subject-matter experts had been right all along. One of Das Gupta’s many convincing counterpoints was a graph showing that in Taiwan, the ratio of boys to girls was near the natural rate for first and second babies (106:100) but not for third babies (112:100); this pattern held up with or without hepatitis B.

In a follow-up blog post, Levitt applauded Oster for bravery in admitting her mistake, but he never credited Das Gupta for her superior work. Our point is not that Das Gupta had to be right and Oster wrong, but that Levitt and Dubner, in their celebration of economics and economists, suspended their critical thinking.

The risks of driving a car: In SuperFreakonomics, Levitt and Dubner use a back-of-the-envelope calculation to make the contrarian claim that driving drunk is safer than walking drunk, an oversimplified argument that was picked apart by bloggers. The problem with this argument, and others like it, lies in the assumption that the driver and the walker are the same type of person, making the same kinds of choices, except for their choice of transportation. Such all-else-equal thinking is a common statistical fallacy. In fact, driver and walker are likely to differ in many ways other than their mode of travel. What seem like natural calculations are stymied by the impracticality, in real life, of changing one variable while leaving all other variables constant.

Stars are made, not born—except when they are born: In 2006, Levitt and Dubner wrote a column for the New York Times Magazine titled “A Star Is Made,” relying on the research of Florida State University psychologist K. Anders Ericsson, who believes that experts arise from practice rather than innate talent. It begins with the startling observation that elite soccer players in Europe are much more likely to be born in the first three months of the year. The theory: Since youth soccer leagues are organized into age groups with a cutoff birth date of December 31, coaches naturally favor the older kids within each age group, who have had more playing time. So far, so good. But this leads to an eye-catching piece of wisdom: The fact that so many World Cup players have early birthdays, the authors write,

may be bad news if you are a rabid soccer mom or dad whose child was born in the wrong month. But keep practicing: a child conceived on this Sunday in early May would probably be born by next February, giving you a considerably better chance of watching the 2030 World Cup from the family section.

Perhaps readers are not meant to take these statements seriously. But when we do, we find that they violate some basic statistical concepts. Despite its implied statistical significance, the size of the birthday effect is very small. The authors acknowledge as much three years later when they revisit the subject in SuperFreakonomics. They consider the chances that a boy in the United States will make baseball’s major leagues, noting that July 31 is the cutoff birth date for most U.S. youth leagues and that a boy born in the United States in August has better chances than one born in July. But, they go on to mention, being born male is “infinitely more important than timing an August delivery date.” What’s more, having a major-league player as a father makes a boy “eight hundred times more likely to play in the majors than a random boy,” they write. If these factors are such crucial determinants of future stardom, what does this say about their theory that a star is made, not born? Practice may indeed be a more important factor than innate talent, but in opting for cute flourishes like these, the authors venture so far from the original studies that they lose the plot.

Making the majors and hitting a curveball: In the same discussion in SuperFreakonomics, Levitt and Dubner write:

A U.S.-born boy is roughly 50 percent more likely to make the majors if he is born in August instead of July. Unless you are a big, big believer in astrology, it is hard to argue that someone is 50 percent better at hitting a big-league curveball simply because he is a Leo rather than a Cancer.

But you don’t need to believe in astrology to realize that the two cited probabilities are not the same. A .300 batting average is 50 percent better than a .200 average. In such a competitive field, the difference in batting averages between a kid who makes the majors and one who narrowly misses out is likely to be a matter of hundredths or even thousandths of a percent. Such errors could easily be avoided.

Predicting terrorists: In SuperFreakonomics, Levitt and Dubner introduce a British man, pseudonym Ian Horsley, who created an algorithm that used people’s banking activities to sniff out suspected terrorists. They rely on a napkin-simple computation to show the algorithm’s “great predictive power”:

Starting with a database of millions of bank customers, Horsley was able to generate a list of about 30 highly suspicious individuals. According to his rather conservative estimate, at least 5 of those 30 are almost certainly involved in terrorist activities. Five out of 30 isn’t perfect—the algorithm misses many terrorists and still falsely identified some innocents—but it sure beats 495 out of 500,495.

The straw man they employ—a hypothetical algorithm boasting 99-percent accuracy—would indeed, if it exists, wrongfully accuse half a million people out of the 50 million adults in the United Kingdom. So the conventional wisdom that 99-percent accuracy is sufficient for terrorist prediction is folly, as has been pointed out by others such as security expert Bruce Schneier.

But in the course of this absorbing narrative, readers may well miss the spot where Horsley’s algorithm also strikes out. The casual computation keeps under wraps the rate at which it fails at catching terrorists: With 500 terrorists at large (the authors’ supposition), the “great” algorithm finds only five of them. Levitt and Dubner acknowledge that “five out of 30 isn’t perfect,” but had they noticed the magnitude of false negatives generated by Horsley’s secret recipe, and the grave consequences of such errors, they might have stopped short of hailing his story. The maligned straw-man algorithm, by contrast, would have correctly identified 495 of 500 terrorists.

This unavoidable tradeoff between false positive and false negative errors is a well-known property of all statistical-prediction applications. Circling back to check all the factors involved in the problem might have helped the authors avoid this mistake.

The climate-change dustup: Rendering research conducted by others is much more challenging than explaining your own work, especially if the topic lies outside your domain of expertise. The climate-change chapter in SuperFreakonomics is a case in point. In it, Levitt and Dubner throw their weight behind geoengineering, a climate-remediation concept championed at the time by Nathan Myhrvold, a billionaire and former chief technology officer of Microsoft. Unfortunately, having moved outside the comfort zone of his own research, Levitt is in no better a position to evaluate Myhrvold’s proposal than we are.

When an actual expert, University of Chicago climate scientist Raymond Pierrehumbert, questioned the claims in Levitt and Dubner’s writing on climate, Levitt retorted that he enjoyed Pierrehumbert’s “intentional misreading” of the chapter. Referring to his own writings on the subject, Levitt wrote, “I’m not sure why that is blasphemy.” We’re not sure on this point either—we could not find a place where Pierrehumbert described Levitt’s writings in those terms. It is easy to be preemptively defensive of one’s own work, or of researchers whose work one has covered. Viewing alternative points of view as useful rather than threatening can help take the sting out of critiques. And if you’re covering subject matter outside your expertise, it pays to get second—and third and fourth—opinions.

comments powered by Disqus


Subscribe to American Scientist