Knowing When to Stop
How to gamble if you must—the mathematics of optimal stopping
At the opposite end from having no information about future values is having full information—that is, complete information about the probabilities and the exact values of all potential future observations. In the spirit of Cardano, Fermat and Pascal’s discoveries about probability with dice centuries ago, let’s consider a game of rolling a standard, fair, six-sided die at most five times. You may stop whenever you want and receive as a reward the number of Krugerrands corresponding to the number of dots shown on the die at the time you stop. (At the time of this writing, one Krugerrand is worth $853.87.) Unlike the no-information marriage problems, here everything is known. The values at each roll will be 1, 2, 3, 4, 5, or 6, and the probability of each number on each roll is one-sixth. The objective is to find the stopping rule that will maximize the number of Krugerrands you can expect to win on average.
If you always stop with the first roll, for example, the winnable amount is simply the expected value of a random variable that takes the values 1, 2, 3, 4, 5, and 6 with probability 1/6 each. That is, one-sixth of the time you will win 1, one-sixth of the time you will win 2, and so on, which yields the expected value 1(1/6) + 2(1/6) + 3(1/6) + 4(1/6) + 5(1/6) + 6(1/6) = 7/2. Thus if you always quit on the first roll, you expect to win 3.5 Krugerrands on the average. But clearly it is not optimal to stop on the first roll if it is a 1, and it is always optimal to stop with a 6, so already you know part of the optimal stopping rule. Should you stop with a 5 on the first roll? One powerful general technique for solving this type of problem is the method of
Clearly it is optimal to stop on the first roll if the value seen on the first roll is greater than the amount expected if you do not stop—that is, if you continue to roll after rejecting the first roll. That would put you in a new game where you are only allowed four rolls, the expected value of which is also unknown at the outset. The optimal strategy in a four-roll problem, in turn, is to stop at the first roll if that value is greater than the amount you expect to win if you continue in a three-roll problem, and so on. Working down, you arrive at one strategy that you do know. In a one-roll problem there is only one strategy, namely to stop, and the expected reward is the expected value of one roll of a fair die, which we saw is 3.5. That information now yields the optimal strategy in a two-roll problem—stop on the first roll if the value is more than you expect to win if you continue, that is, more than 3.5. So now we know the optimal strategy for a two-roll problem—stop at the first roll if it is a 4, 5, or 6, and otherwise continue—and that allows us to calculate the expected reward of the strategy.
In a two-roll problem, you win 4, 5, or 6 on the very first roll, with probability 1/6 each, and stop. Otherwise (the half the time that the first roll was a 1, 2 or 3) you continue, in which case you expect to win 3.5 on the average. Thus the expected reward for the two-roll problem is 4(1/6) + 5(1/6) + 6(1/6) + (1/2)(3.5) = 4.25. This now gives you the optimal strategy for a three-roll problem—namely, stop if the first roll is a 5 or 6 (that is, more than 4.25), otherwise continue and stop only if the second roll is a 4, 5, or 6, and otherwise proceed with the final third roll. Knowing this expected reward for three rolls in turn yields the optimal strategy for a four-roll problem, and so forth. Working backwards, this yields the optimal strategy in the original five-roll problem: Stop on the first roll only if it is a 5 or 6, stop on the second roll if it is a 5 or 6, on the third roll if it is a 5 or 6, the fourth roll if it is a 4, 5 or 6, and otherwise continue to the last roll. This strategy guarantees that you will win about 5.12 Krugerrands on average, and no other strategy is better. (So, in a six-roll game you should stop with the initial roll only if it is a 6.)
The method of backward induction is very versatile, and works equally well if the process values are not independent (as they were assumed to be in repeated rolls of the die), or if the objective is to minimize some expected value such as cost. Suppose that a company must purchase its weekly energy supplies on the previous Monday, Tuesday or Wednesday and the likelihood of future prices can be estimated based on past statistics. For example, if on Monday the decision maker has the opportunity to purchase energy for the following week at a cost of 100, she may know from past experience that there is a 50-50 chance that Tuesday’s price will be 110, and otherwise it will be 90. Furthermore she knows that if it is 110 on Tuesday, then it will be 115 on Wednesday with probability 1/3, and otherwise will be 100; and that if it is 90 on Tuesday, it is equally likely to be 100 or 85 on Wednesday. Using backward induction, the optimal rule on Tuesday is seen to be not to buy if the price is 110, since 110 is larger than the expected price (1/3)(115) + (2/3)(100) = 105 if she delays buying until Wednesday. Similarly, if the price Tuesday is 90, it is optimal to buy. Working backwards to Monday, since 100 is larger than the expected cost if she continues—namely, (1/2)(105) + (1/2)(90) = 97.5—it is optimal not to buy on Monday. Putting these together yields her optimal stopping strategy.
In the full-information case, with the objective of stopping with one of the largest
values, the best possible probabilities of winning were unknown for general finite sequences of independent random variables. Using both backward and forward induction, and a class of distributions called “Bernoulli pyramids,” where each new variable is either the best or the worst seen thus far (for example, the first random variable is either +1 or -1 with certain probabilities, the second variable is +2 or -2, and so forth), Douglas Kennedy of the University of Cambridge and I discovered those optimal probabilities. We proved that for every finite sequence of independent random variables, there is always a stop rule that stops with the highest value with probability at least 1/
, and a stop rule that stops with one of the two highest values with probability at least
which is the best you can do. (The probabilities for stopping with one of the
values highest have similar formulas).
Computers were not useful for solving that problem. In fact, all of the problems described in this article were solved using traditional mathematicians’ tools—working example after example with paper and pencil; settling the case for two, three and then four unknowns; looking for patterns; waiting for the necessary
insights; and then searching for formal proofs for each step. Computers are very helpful for after-the-fact applications of many results, such as backward induction. But in theoretical probability, computers often do not significantly aid the discovery process. To better understand this, reconsider the two-card problem described above. How would one program a computer to imagine such a strategy existed, let alone to search the universe of ideas to find it?