COMPUTING SCIENCE

# Fat Tails

Sometimes the average is anything but average

Fat Tails

The little procedure I have named the factoidal function is so simple that I'm sure someone must have noticed it before. I have not found a mention of this specific process, but slightly more general models involving products of random numbers do appear in the literature. (Review articles by Mark Newman of the University of Michigan and by Michael Mitzenmacher of Harvard University are particularly helpful.)

The context of these discussions is the study of heavy-tailed or fat-tailed distributions. The familiar normal distribution is *not*in this class: It is lean-tailed. The extremes of the normal probability curve, far from the peak, fall away exponentially, so that unlikely events become *really*unlikely and are never seen. Fat-tailed distributions decay more slowly, allowing room for outliers and freaks. Human height is a normally distributed variable; most people are less than two meters tall, and nobody reaches three meters. Human wealth has a fat-tailed distribution; worldwide, median net worth is a little over $2,000, but there are also millionaires and billionaires. (If height had the same distribution as wealth, there would be people two million meters tall.)

The distribution of wealth was one of the subjects that first aroused interest in fat-tailed distributions, starting with the work of the Italian economist Vilfredo Pareto in the 1890s. Later it emerged that word frequencies in natural languages are also described by a fat-tailed distribution, usually called Zipf's law, after George Kingsley Zipf. The sizes of cities offer another example: If urban populations were normally distributed, we wouldn't have Mumbai or São Paulo. In the past decade or so, it seems like fat tails have been turning up everywhere: in the number of links to Web sites and citations of scientific papers, in the fluctuations of stock-market prices, in the sizes of computer files.

The classic fat-tailed distribution is one where the decay of the tails is described by a power law. The probability of observing some quantity *x* goes as *x* ^{-a}, where a is a constant; the smaller the value of a, the fatter the tails. When a is less than 2, the mean of the distribution does not exist. Drawn on a graph with logarithmic scales, a power-law distribution takes the form of a straight line. Another fat-tailed distribution, called the lognormal, follows a straight line over a certain range but at some point takes a sudden nosedive. The lognormal, as the name suggests, is the distribution formed by variables whose logarithms are normally distributed.

What about the factoidal function—which distribution describes the *n*? values? My first guess was a lognormal, based on a vague intuition that the logarithms of the *n*? products should indeed be normal. So much for *my* intuition! A log-log graph of the factoidal function shows clear evidence of power-law behavior: The graph is a straight line, with no hint of the "bended knee" to be expected in a lognormal. The calculated value of the exponent a is about 1.07, well inside the range where the mean and variance cease to exist.

With guidance from Newman and Mitzenmacher I eventually came to understand why the factoidal follows a power law. They pointed me to a paper by William J. Reed of the University of Victoria in Canada and Barry D. Hughes of the University of Melborne in Australia. Reed and Hughes show that when a process of exponential growth is stopped at random times, the resulting distribution of values follows a power law. One of their examples is multiplication of random numbers with mean m, stopped after a random number of terms. The factoidal function is merely a special case of this process.

The shape of a probability distribution can have grave consequences in many areas of life. If the size and intensity of hurricanes follows a normal distribution, we can probably cope with the worst of them; if there are monster storms lurking in the tail of the distribution, the prospects are quite different. Those who make a profession of risk assessment—insurance underwriters, financial analysts—take a keen interest in these questions.

Could the fat-tails phenomenon clear up the Lake Wobegon mystery? Well, maybe it can teach us a new way to understand the phrase, "All the children are above average." It's not about drawing a line through the population and having each and every child above the line. The sense of "all" is not "each and every" but the totality of children. If we can believe that human talents and abilities have some distribution that escapes the bounds of means and variances, then "all" children can indeed be above average.

© Brian Hayes

EMAIL TO A FRIEND :

**Of Possible Interest**

**Feature Article**: The Statistical Crisis in Science

**Computing Science**: Clarity in Climate Modeling

**Technologue**: Weighing the Kilogram

**Other Related Links**