Random numbers have a peculiar power, even when they are only pseudo- or quasirandom
A Slight Discrepancy
Uniformity of distribution is measured in terms of its opposite, which is called discrepancy. For points in a two-dimensional square, discrepancy is calculated as follows. Consider all the rectangles with sides parallel to the coordinate axes that could possibly be drawn inside the square. For each such rectangle, count the number of points enclosed, and also calculate the number of points that would be enclosed, based on the area of the rectangle, if the distribution were perfectly uniform. The maximum difference between these numbers, taken over all possible rectangles, is a measure of the discrepancy.
Another measure of discrepancy, called star discrepancy, looks only at the subset of axis-parallel rectangles that have one corner anchored at the origin of the unit square. And there’s no reason to consider only rectangles; versions of discrepancy can be defined for circles or triangles or other figures. The measures are not all equivalent; results vary depending on the shape. It’s interesting to note that the process for measurement of discrepancy has a strong resemblance to the Monte Carlo process itself.
Grids and lattices fare poorly when it comes to discrepancy because the points are arranged in long rows and columns. An infinitely skinny rectangle that should enclose no points at all can take in an entire row. From such worst-case rectangles it follows that the discrepancy of a square lattice with N points is √N. Interestingly, it turns out that the discrepancy of a random or pseudorandom lattice is also about √N. In other words, in an array of a million random points, there is likely to be at least one rectangle that has either 1,000 points too many or 1,000 too few.
Quasirandom patterns are deliberately designed to thwart all opportunities for drawing such high-discrepancy rectangles. For quasirandom points, the discrepancy can be as low as the logarithm of N, which is much smaller than √N. For example, at N=106, √N=1,000 but log N=20. (I am taking logarithms to the base 2.)
Discrepancy is a key factor in gauging the performance of Monte Carlo and quasi–Monte Carlo simulations. It determines the level of error or statistical imprecision to be expected for a given sample size. For conventional Monte Carlo, with random sampling, the expected error diminishes as 1/√N. For quasi–Monte Carlo, the corresponding “convergence rate” is (log N)d/N, where d is the dimensionality of the space. (Various constants and other details are neglected here, so the comparison is valid only for rates of growth, not for exact values.)
The 1/√N convergence rate for random sampling can be painfully slow. Getting one more decimal place of precision requires increasing N by a factor of 100, which means that a one-hour computation suddenly takes four days. The reason for this sluggishness is the clumpiness of the random distribution. Two points that happen to lie close together waste computer effort, because the same region is sampled twice; conversely, voids between points leave some areas unsampled and thereby contribute to the error budget.
Compensating for this drawback, random sampling has one very important advantage: The convergence rate does not depend on the dimension of the space. To a first approximation, the program runs as fast in 100 dimensions as in two dimensions. Quasi–Monte Carlo is different in this respect. If only we could ignore the factor of (log N)d, the performance would be dramatically superior to random sampling. We would have a 1/N convergence rate, which calls for just a tenfold increase in effort for each additional decimal place of precision. However, we can’t ignore the (log N)d part. This factor grows exponentially with increasing dimension d. The effect is small in dimension 1 or 2, but eventually it becomes enormous. In 10 dimensions, for example, (log N)10/N remains larger than 1/√N until N exceeds 1043.
It was this known slow convergence in high dimensions that led to surprise over the success of the Paskov and Traub financial calculation with d=360. The only plausible explanation is that the “effective dimension” of the problem is actually much lower than 360. In other words, the volume being measured doesn’t really extend through all the dimensions of the space. (In the same way, a sheet of paper lives in a three-dimensional space, but not much is lost by pretending it has no thickness.)
This explanation may sound dismissive: The calculation succeeded not because the tool was more powerful but because the problem was easier than it looked. But note that the effective reduction in dimension works even when we don’t know which of the 360 dimensions can safely be ignored. That’s almost magical.
How commonplace is this phenomenon? Is it just a fluke, or confined to a narrow class of problems? The answer is not yet entirely clear, but a notion called “concentration of measure” offers a reason for optimism. It suggests that the high-dimension world is mostly a rather smooth and flat place, analogous to a high-gravity planet where it costs too much to create jagged alpine landscapes.