COMPUTING SCIENCE

# New Dilemmas for the Prisoner

# Dictators and Extortionists

One mischievous strategy might be called the dictator: It unilaterally sets the other player’s long-term average score to any value between the mutual-defection payment and the mutual-cooperation payment. (For the standard payoff values 0, 1, 3, 5, that means anywhere between 1 and 3.)

Consider the strategy (4/5, 2/5, 2/5, 1/5), where the four numbers again indicate the probability of choosing to cooperate after *cc, cd, dc, dd,* respectively. If *Y *plays this strategy against *X*, then *X*’s average score per round will converge on the value 2.0 after a sufficiently long series of games, no matter what strategy *X* chooses to play. In the lower illustration on the previous page *X* responds to *Y*’s coercion with four different strategies, but in each case *X*’s average score gravitates ineluctably toward 2.0. It should be emphasized that dictating *X*’s score does not require *Y* to make any active adjustments or responses as the game proceeds. *Y* can set the four probabilities and then “go to lunch,” as Press and Dyson put it.

A second form of mischief manipulates the ratio between *X*’s score and *Y*’s score. If *S*_{X} and *S*_{Y} are the players’ long-term average scores, the strategy allows *Y* to enforce the linear relation *S*_{Y} = 1 + *M*(*S*_{X} – 1), where *M* is an arbitrary constant greater than 1. *X* has the option of playing an always-defect strategy, which consigns both players to the minimal payoff of one point per round. But if *X* takes any steps to improve this return, every increment to *S*_{X} will increase *S*_{Y} by *M *times as much. Press and Dyson call the technique *extortion*. As an example they cite the strategy (11/13, 1/2, 7/26, 0), which sets *M* = 3. If *Y* adopts this rule, *X* can play always-defect (or tit-for-tat) to limit both players to one point per round. When *X* chooses other strategies, however, *Y* comes out ahead. If *X* plays Pavlov, the scores are approximately *S*_{X} = 1.46 and *S*_{Y} = 2.36. To maximize his or her score, *X *must cooperate unconditionally, earning an average of 1.91 points, but then *Y* gets 3.73 points.

The discovery of dictatorial and extortionate strategies came as a great surprise, and yet there were precedents. Aspects of the discovery were anticipated in the 1990s by Maarten C. Boerlijst, Martin A. Nowak, and Karl Sigmund. Moreover, not all of the zero-determinant strategies are exotic ideas that no one ever thought of trying. Tit-for-tat, the most famous of all IPD rules, is in fact an extortionate zero-determinant strategy. It sets *M* = 1, forcing equality of scores.

Watching the coercive strategies in action (or playing against them), I can’t help feeling there is something uncanny going on. In a game whose structure is fully symmetrical, how can one contestant wield such power over the other? In the case of the dictatorial strategies, the symmetry isn’t so much broken as transformed: When I take control of your score, I lose control of my own; although there’s nothing you can do to alter your own score, you have the power to set mine.

The extortionate strategies can’t be explained away so easily. There really is an asymmetry, with one player grabbing an unfair share of the spoils, and the only defense is to retreat to the policy of universal defection that leaves everyone impoverished. IPD seems to be back in the same dreary jail cell where it all began.

EMAIL TO A FRIEND :