Sometimes the average is anything but average
Playing at the keyboard one day, I came up with the following factorial-like procedure:
r: = random(1,n)
then return 1
else return r * f?(n)
This might be a buggy version of a program intended to calculate factorials, but it actually does something a good deal stranger. The auxiliary procedure random(1,n), invoked in the second line, is assumed to return an integer selected at random from the range 1 through n. Thus the program chooses a series of random integers, multiplies them, and stops when random(1,n) happens to yield a 1. It's like rolling an n-sided die and keeping a running product of all the numbers seen until a 1 appears.
Here are the results of a few sample runs of the program, for n = 7:
4x3x2x4x7x2x1 = 1,344
2x5x4x5x4x3x6x2x2x5x1 = 288,000
7x5x5x1 = 175
4x7x6x2x6x5x6x3x5x3x3x3x7x4x3x3x7x2x5x1 = 432,081,216,000
Note that 7! is equal to 5,040, a value that doesn't turn up in this sample. The results that do appear are scattered over quite a wide range.
This randomized analogue of the factorial function needs a name: I shall call it the factoidal function. Also, risking the ire and ridicule of some latter-day De Morgan, I'm going to adopt the notation "n?" to suggest the nondeterministic nature of the computation.
Strictly speaking, the factoidal function isn't a function—not in the mathematical sense of that word. An essential property of a function is that applying it to the same argument always yields the same value. The function call f!(7) will produce the value 5,040 time after time. But, as we have just seen, f?(7) is likely to return a different value every time it is invoked, depending on the vagaries of the random-number generator. For that matter, the procedure might not return any value at all; the program could run forever. The f! procedure is guaranteed to terminate, because on each recursive call (or each passage through the loop) the value of n is decremented, and eventually it has to reach 1. But f? halts only when the roll of a die comes up 1, which might never happen.
In practice, of course, the program always terminates, one way or another. A back-of-the-envelope calculation shows there's about a two-thirds chance that random(1,n) will produce a 1 in n trials. And a 1 is almost certain to turn up within 5n trials. Thus if n is small (less than 1,000, say), f?(n) will almost surely succeed in calculating a value. If n is very large, the product being computed is likely to fill up all available memory, and the program will terminate with an error message.
Because of the element of randomness in the factoidal definition, it makes no sense to ask about the
value of n
?. The best we can hope for is to understand the statistical distribution of values. As a rule, this would mean estimating quantities such as the average value and the variance or standard deviation. But those familiar statistical tools are problematic in this case, so let's start by asking an easier question: How do factoidals compare with factorials? Is n
? usually larger than n
!, or smaller? (They can also be equal, of course, but as n
increases, the probability of such an exact match goes to zero.)
When I first began puzzling over this question, I convinced myself that n
? would usually exceed n
!. Mercifully, I have forgotten the details of the "reasoning" that led me to this conclusion; it had something to do with the idea that only finitely many possible values of n
? are less than n
!, whereas infinitely many are greater. There's no point in reconstructing my argument, because it was totally wrong. When I wrote a program to answer the question experimentally, it became clear that nearly two-thirds of the n
? values are less than n
!. (The results appear in the illustration above.)
You can get an inkling of what's going on here by looking at a small example, say n=3, and counting the cases in which the final product is less than 3!, or 6. For n=3 the only possible outputs of the random number generator are 1, 2 and 3, each of which appears with probability 1/3. The simplest event is that a 1 comes up on the first try, in which case the product is obviously less than 6; this happens with probability 1?3. There are two possible sequences of just two factors, namely a 2 followed by a 1 and a 3 followed by a 1; each of these outcomes has a probability of (1/3)2, or 1/9, and again the products are less than 6. A pencil-and-paper enumeration shows that there is only one other sequence of factors whose product is less than 6, namely 2, 2, 1; here the probability is 1/27. The sum of these probabilities is 1/3 + 2/9 + 1/27 = 16/27, or 0.5926. This is in agreement with the experimental results.
As n increases, the proportion of n? values less than n! converges on the value 0.6322, and the proportion that are greater tends toward 0.3678. Is there anything special about these numbers, and can we explain why this particular proportion should crop up? I think so. Note that on each iteration, the probability of getting a 1 and thereby ending the factoidal process is 1/n. That means the probability of getting anything other than a 1, and continuing the sequence of factors, is (n–1)/n. Then the probability of avoiding 1 twice in a row is ((n–1)/n)2, the probability of going on to three factors is ((n–1)/n)3, and so on. To have a high likelihood that n? will exceed n!, the chain of factors has to be extended to a length of at least n. The probability of reaching this point is ((n–1)/n)n; as n increases, this expression converges to 1/e, with a numerical value of 0.3678—just what is observed.
If most values of n? are less than n!, then the average of all n? values should also be less than n!, shouldn't it? We'll see. But first another digression, to look at a closely related function that shows how averaging is supposed to work.