The Quest for Randomness
Determining whether numbers truly can have no pattern has implications for quantum mechanics, not to mention the stock market and data security.
One of my favorite Dilbert cartoons features a lizardlike creature that’s billed as a “random number generator,” but that only ever seems to spit out the number nine. Dilbert asks his guide, “Are you sure that’s random?” The guide replies, “That’s the problem with randomness. You can never be sure.”
It’s funny (at least to me), but is it true? Can you ever be reasonably sure that something is random, in the same sense you can be reasonably sure something is not random (for example, because it consists of endless nines)? Even if a sequence looked random, how could you ever rule out the possibility that it had a hidden deterministic pattern? And what exactly do we mean by “random,” anyway?
These questions might sound metaphysical, but we don’t need to look far to find real-world consequences. In computer security, it’s crucial that the keys used for encryption be generated randomly—or at least, randomly enough that a potential eavesdropper can’t guess them. Day-to-day fluctuations in the stock market might look random—but to whatever extent they can be predicted, the practical implications are obvious. Casinos, lotteries, auditors, and polling firms all get challenged about whether their allegedly random choices are really random, and all might want ways to reassure skeptics that they are.
Then there’s quantum mechanics, which famously has declared for a century that “God plays dice,” that there’s irreducible randomness even in the behavior of subatomic particles. To Albert Einstein, the only logical conclusion to draw was that quantum mechanics is incomplete, and physicists should keep looking for a better theory. Today, it’s often said that quantum mechanics won the debate and Einstein lost. But on what grounds, exactly? Is it conceivable that Einstein will be vindicated after all?
Fire Up the Statistics
The obvious way to check whether a sequence is random is just to throw some statistical tests at it. For example, suppose we were interested in the randomness of the following sequence: 314159265358979323846264338327950288419716...
As a first step, we could check whether the digits 0 through 9 appeared with approximately equal frequency, among, say, the first million digits. Passing such a test is clearly necessary for randomness, in the sense that, if the digits really were drawn completely randomly, and we looked at a million of them, then with overwhelming probability we would see roughly 10 percent zeros, 10 percent ones, and so forth. On the other hand, passing this test is not sufficient for randomness—because, for example, the decidedly nonrandom sequence 012345678901234567890123456789... also has the right digit frequencies.
So for an improved test, we could check whether the 100 consecutive two-digit pairs—that is, 00 through 99—occur with frequency close to 1 percent. A truly random sequence would pass this test almost surely. And it’s also more effective than looking at single-digit frequencies alone. But again, once we know that the test involves looking at two-digit frequencies, it’s easy to fool the test. For example, consider the following sequence, which was proposed in the 1930s by David Champernowne (then an undergraduate classmate of computing pioneer Alan Turing) and which simply consists of all the positive integers written out together: 123456789101112131415161718192021222324...
You might enjoy proving that not only does Champernowne’s sequence have the same two-digit frequencies as a random sequence, it also has the same three-digit frequencies, the same four-digit frequencies, and so on. Yet clearly the sequence isn’t random.
A moment’s thought will reveal that the problem is completely general. For example, the first sequence above seems to pass virtually every statistical test for randomness ever devised—until, that is, you notice that it is the digits of pi. With that single realization, the sequence flips from completely unpredictable to completely predictable. The same would be true for, say, the digits of the square root of two, or any other random-looking sequence that we can nevertheless specify by a rule.
Indeed, let me now argue that no matter how clever you are in devising a statistical test for randomness, an adversary will always be able to find a deterministic sequence that passes your test. If nothing else, such an adversary could say the following:
Let S be the set of all sequences of, say, one million digits that are declared to be random by your test. Presumably S is nonempty, because something had better pass your test! So for my sequence, I’ll simply pick the first sequence in S, when the sequences are viewed as numbers and arranged in ascending order. By hypothesis, that sequence passes your test, yet I just specified it deterministically!
Saved by Complexity
The situation is not quite as hopeless as it sounds. In the 1960s, a beautiful field called algorithmic information theory was developed. This field almost (but not quite) lets us cut through the arguments, and gives a test for individual sequences such that, if they pass the test, then they must have been generated randomly.
The starting point is a paradox that goes back to the beginnings of probability. Consider a coin that landed heads every time you flipped it. Would you eventually conclude the coin was crooked? Of course you would—just as Dilbert suspected the random number–generating lizard that always output nine! On the other hand, you’d probably find nothing wrong if you saw the following sequence of 30 heads and tails: T H T H H H T H H H H T T H H T H H T H T H T H T H T T H T.
The paradox is that a fair coin is exactly as likely to produce this sequence as it is to produce 30 heads in a row. Both sequences have the same probability, namely, 2-30 (base 2 numbers are used because there are two options here, heads or tails).
French mathematician Pierre-Simon Laplace, considering this puzzle in the 1800s, realized that a run of 30 heads is special or meaningful in a way that another sequence is not. If we instead asked for the probability that we’d get a sequence that was just as random-looking and patternless as the one above, the probability would be almost 1.
The trouble is, how do we formalize the concept of a sequence of numbers being random-looking and patternless? This is where three mathematicians—Andrei Kolmogorov, Ray Solomonoff, and Gregory Chaitin—come in. In the 1960s, they independently proposed a measure of the inherent randomness of a sequence, which today we call Kolmogorov complexity (no, the name isn’t historically fair, but “Kolmogorov-Solomonoff-Chaitin complexity” is an unwieldy mouthful). Informally, the Kolmogorov complexity of a sequence x, abbreviated K(x), is defined to be the number of bits in the shortest computer program whose output is x. To illustrate, a sequence such as 00000000000000000000000000000000000000000000000000000000000000000000000000000000 can be described by a computer program much shorter than itself, namely:
print 80 zeros
Thus the Kolmogorov complexity of the sequence is much smaller than the number of bits in the sequence, denoted |x|. By contrast, a random sequence of 1s and 0s, such as 10110011011010011101010000100110111110111001011011100011010011000110010011111011, will almost always have a Kolmogorov complexity close to the length of the sequence itself. This complexity arises because, if you try to write a program to output the sequence, you’ll find that the program has to contain the whole sequence (or something equally long) as part of its code:
The notion of Kolmogorov complexity lets us neatly resolve the paradox from before. When should you suspect a coin of being crooked? When a sequence x of flips of the coin satisfies K(x) « |x|. In other words, you should suspect a coin if and only if there exists a pattern to the coin flips that can be described by a computer program using substantially fewer bits than the sequence of coin flips itself. And crucially, at least in cases with many coin flips, it doesn’t matter which programming language you use, because the choice of language will affect the program’s length only by an additive constant that doesn’t depend on x.
So the real problem is that the Kolmogorov complexity is impossible to compute. To clarify, you can certainly write a program that looks for the shortest one that outputs x. But how do you know when to stop? What if there’s an even shorter program out there? It turns out that we can never be sure we’ve found the shortest one.
The proof of that fact is inspired by a famous riddle called the Berry paradox, after a librarian who apparently told it to mathematician Bertrand Russell. Consider asking for “the first number that can’t be described using 12 words or fewer.” But we just described the number, and we did so using only 12 words!
In our case, suppose, contrary to assumption, that there were an algorithm that computed the Kolmogorov complexity K(x) of any string x. Also, suppose the algorithm required n bits to write down.
If we took all possible strings of bits arranged in lexicographic order (0, 1, 00, 01, 10, 11, 000, 001, etc.), I claim we could write a program to find the first string y such that K(y) ≥ 2n. The program will simply run the algorithm on each string in succession, and stop when it finds the first string y that satisfies the requirements. Furthermore, the program will require slightly more than n bits to write down: n bits for the algorithm, plus a few additional bits to control the “repeatedly calling the algorithm until we find y” part. However, this means we just described a program that computes y, and that (for large enough n) is less than 2n bits long! It follows that K(y) < 2n, which contradicts the fact that K(y) ≥ 2n. We conclude that the program to compute K(x) could never have existed in the first place.
A Random Number Complex
Even though Kolmogorov complexity can’t be computed, it still has huge conceptual applications to ensuring something is random. The starting point for those applications is yet another puzzle. How can you produce a string of numbers—even just one—that has high Kolmogorov complexity?
There’s one easy way to do it: Pick a string randomly! If you choose a string of n bits completely at random, then there’s an overwhelmingly good chance that it will have Kolmogorov complexity close to n, because of what’s known in the trade as a “counting argument.” There are 2n strings of n bits (again, base 2, because there are two options for bits, 0 or 1), but it turns out that there are at most 2k+1 – 1 possible computer programs of k bits or fewer (less, actually, because most strings don’t correspond to any valid program). By definition, each program can produce at most one n-bit string as its output.
But if k is even a little bit smaller than n (say, n – 50), then 2k+1 – 1 will be vastly smaller than 2n. So we conclude that there simply aren’t enough short programs to go around: If you choose an n-bit string x at random, then you’ll almost certainly have K(x) approximately equal n. The crucial point is that picking randomly turns out to be the only way to generate strings with high Kolmogorov complexity.
To give a real-world example of this phenomenon, consider file compression programs such as gzip. These programs make files smaller by exploiting regularities in the data to be compressed, such as the same words appearing over and over. But you can’t compress a random string, because the data are already free of computable regularities (or in other words, the string has maximal Kolmogorov complexity). As an interesting aside, in 2002, a company called ZeoSync made headlines with claims that it had a breakthrough algorithm that could compress arbitrary data. In interviews, however, the company’s founder was never able to answer questions such as: What would happen if you fed a compressed file back into the compression program, and so on ad infinitum? Would the file get compressed to nothing? The company disappeared a few months later.
Do I sense that you’re still underwhelmed by this? Does it all feel circular to you—like a complicated way of saying “if something is random, then it’s random”? Sure, we’ve replaced a “metaphysical” question (i.e., whether events are determined or random) by a purely mathematical question (i.e., whether a description of the events has high Kolmogorov complexity). But it might not be obvious that we gain anything from that trade! Let me give a concrete example that helped convince me that the trade is beneficial.
Sampling Versus Search
Over the past five years, my student Alex Arkhipov and I have been studying a rudimentary type of quantum computer, called a BosonSampler. Such a device’s entire computation consists of generating a stream of identical photons, sending the photons through fiber-optic cables and a network of devices that split the photon beam, and then measuring the photons to see where they ended up. This being quantum mechanics, where the photons land will in general be random—but if you know how the beamsplitter network was set up, then at least in principle you can calculate the probability of any given outcome (for example, “one photon at location A, three photons at location B, no photons at location C...”).
The problem is that we don’t know how to do anything useful with a BosonSampler—not even anything nefariously useful, such as breaking cryptographic codes. Indeed, the only thing BosonSamplers seem to be good for is the tautological task of simulating themselves!
So why bother to study them? The reason is that, even if BosonSamplers are useless, Arkhipov and I were able to give strong evidence that simulating them with a classical computer is hard! More precisely, we showed that under widely believed hypotheses in theoretical computer science, no efficient classical algorithm could sample from the same probability distribution that a BosonSampler does. So if we merely wanted to give direct experimental evidence that quantum computing is possible— that is, that the physical world can do something exponentially faster than our existing computers—then BosonSampling might offer a more direct path than building a full-fledged universal quantum computer. For that and other reasons, BosonSampling could be a stepping-stone toward more useful forms of quantum computing.
In the last year, the first BosonSampling experiments were actually done, by groups in Rome, Oxford, Vienna, and Brisbane. The current experiments involve only three photons, so can easily be simulated using a classical computer. To exceed the capabilities of classical computers, one would probably need to use 20 to 30 photons, but getting that many photons to arrive at the detectors simultaneously presents major experimental challenges.
What does any of this have to do with recognizing randomness? Well, BosonSampling is an unusual problem, one where the goal is not to calculate some specific answer, but to sample according to a certain probability distribution. So a question of principle arises: Given a purported BosonSampler, how could a skeptic ever know whether the machine worked?
Sure, we could always run the machine over and over, collect a massive amount of statistics, and then check whether the statistics matched the theoretical predictions. The problem is that we’d have to run the machine a number of times that grew exponentially with the number of photons. Even with just 8 or 10 photons, we would already be pushing the limits of the experimenters’ patience. So we might wonder: Is there any way to check whether a BosonSampler works using only a small number of measurements (say, 100)?
For this purpose, it would be useful to have a search problem, in which the computer’s goal is to output any string of bits with specified properties. The answers to search problems are easier to check than sampling problems. Indeed, a BosonSampler already solves the search problem of finding photon configurations that appear in the distribution with nonzero probability. However, these two problems still aren’t equivalent to each other, because a search problem only requires finding any possible solution, whereas a sampling problem needs a likely configuration as its output. But how do we formalize the concept of a photon configuration being likely or typical?
It probably won’t surprise you that this is where Kolmogorov complexity comes to the rescue. Building on work from the 1970s, in 2011 I showed that, given any sampling problem, one can define a search problem that’s computationally equivalent to it. This means that if one problem is efficiently solvable by classical computers then both of them are, if one is efficiently solvable by quantum computers then both are, and so on, for every other reasonable model of computation. The trick is the following: Given a probability distribution, we consider a search problem that amounts to “Find a string that occurs with large probability in the distribution and also has large Kolmogorov complexity.”
Using tools developed in the 1970s, one can show that, if a computer program manages to find a string that satisfies both of the above properties, then the program can only have done so by sampling from the distribution. In this way, Kolmogorov complexity lets us switch the goal from sampling a distribution to outputting a fixed string with specified properties.
Let’s return to the question raised by the Dilbert cartoon: Is it ever possible to be sure something is random? If we make no auxiliary assumptions whatsoever, the strict answer is no. At some metaphysical level, any sequence claimed to be random—no matter how patternless it looked—could have been determined (by God, the initial conditions of the universe, or whatever) to be precisely what it is. However, we can come closer to that ideal than one might have thought possible—in effect, saying “If these numbers aren’t random, then some fundamental assumption has to collapse.”
In this article, the assumption was that any regularities in nature are computable, so that they correspond to low Kolmogorov complexity. In my next column, I’ll discuss quantum mechanics and a famous theorem called the Bell inequality, and how they let us generate random numbers under the sole assumption that it’s impossible to send signals faster than light. Notably, the latter approach won’t suffer from the problem of uncomputability—so unlike Kolmogorov complexity, it will actually lead to practical methods for generating guaranteed random numbers.
Scott Aaronson will continue this discussion in the July–August issue’s Technologue column.