Statisticians can reuse their data to quantify the uncertainty of complex models
From its origins in the 19th century through about the 1960s, statistics was split between developing general ideas about how to draw and evaluate statistical inferences, and working out the properties of inferential procedures in tractable special cases (like the one we just went through) or under asymptotic approximations. This yoked a very broad and abstract theory of inference to very narrow and concrete practical formulas, an uneasy combination often preserved in basic statistics classes.
The arrival of (comparatively) cheap and fast computers made it feasible for scientists and statisticians to record lots of data and to fit models to them. Sometimes the models were conventional ones, including the special-case assumptions, which often enough turned out to be detectably, and consequentially, wrong. At other times, scientists wanted more complicated or flexible models, some of which had been proposed long before but now moved from being theoretical curiosities to stuff that could run overnight. In principle, asymptotics might handle either kind of problem, but convergence to the limit could be unacceptably slow, especially for more complex models.
By the 1970s statistics faced the problem of quantifying the uncertainty of inferences without using either implausibly helpful assumptions or asymptotics; all of the solutions turned out to demand even more computation. Perhaps the most successful was a proposal by Stanford University statistician Bradley Efron, in a now-famous 1977 paper, to combine estimation with simulation. Over the last three decades, Efron’s “bootstrap” has spread into all areas of statistics, sprouting endless elaborations; here I’ll stick to its most basic forms.
Remember that the key to dealing with uncertainty in parameters is the sampling distribution of estimators. Knowing what distribution we’d get for our estimates on repeating the experiment would give us quantities, such as standard errors. Efron’s insight was that we can simulate replication. After all, we have already fitted a model to the data, which is a guess at the mechanism that generated the data. Running that mechanism generates simulated data that, by hypothesis, have nearly the same distribution as the real data. Feeding the simulated data through our estimator gives us one draw from the sampling distribution; repeating this many times yields the sampling distribution as a whole. Because the method gives itself its own uncertainty, Efron called this “bootstrapping”; unlike Baron von Münchhausen’s plan for getting himself out of a swamp by pulling himself out by his bootstraps, it works.
Let’s see how this works with the stock-index returns. Figure 2 shows the overall process: Fit a model to data, use the model to calculate the parameter, then get the sampling distribution by generating new, synthetic data from the model and repeating the estimation on the simulation output. The first time I recalculate q0.01 from a simulation, I get -0.0323. Replicated 100,000 times, I get a standard error of 0.00104, and a 95 percent confidence interval of (–0.0347, –0.0306), matching the theoretical calculations to three significant digits. This close agreement shows that I simulated properly! But the point of the bootstrap is that it doesn’t rely on the Gaussian assumption, just on our ability to simulate.