Lately I have tried implementing a few elementary redistricting algorithms, testing them on North Carolina examples. My toy geographic information system includes county-level demographic data (downloaded from the Census Bureau Web site), a polygonal outline of each county (using coordinates cadged from a commercial Postscript map), and a hand-compiled "touch list" of contiguity data. (No political information is included.) Maps drawn by this system cannot be taken seriously as redistricting proposals, in part because whole counties are too coarse a unit of population to satisfy the requirements of district equality and the Voting Rights Act. But even a crude model reveals some of the traps and snares that always await when you try to specify a process in enough detail for a machine to do it. (Source code is available.)
The simplest of the algorithms is a cake-cutting method. Imagine holding a knife over a map of North Carolina, with the blade lined up north-to-south as it moves slowly from west to east. As soon as the knife has crossed counties with enough people to make up a Congressional district, you cut. Then the knife continues westward, until it has passed over another district's worth of population, where you make a second cut, and so on. The procedure creates a series of districts as vertical slices. A variation of the method was once the subject of a proposed ballot initiative in California.
In describing an algorithm like this one, details are crucial. When is the knife blade deemed to have passed over a county-when it touches the westernmost point, the easternmost, the midpoint, the county seat? How do you adjudicate ties, when two or more counties lie at the same longitude? Exactly when does the knife blade cut off a group of counties-just before the district exceeds the ideal size, or just after, or by some other rule? Decisions on these matters may be arbitrary, but they have to be stated explicitly.
Maintaining contiguity in the districts turns out to be the trickiest part of writing a program for the cake-cutting algorithm. Just because two counties are adjacent in a list sorted in west-to-east order does not mean the counties are contiguous. I dealt with this complication as I was writing the program, but a subtler problem caught me by surprise later, when I first tested it. The procedure can run into a blind alley, leaving a county stranded among counties already allocated to other districts. In this situation the program simply fails: It cannot create the required k contiguous districts.
Using counties as indivisible units of population, the cake-cutting algorithm yields pretty ragged-looking districts, with a mean deviation from the ideal district size of almost 12 percent. If the same algorithm were applied to census blocks, the results would be much smoother (North Carolina has 100 counties, compared with almost 230,000 census blocks), but some of the districts would be very long and skinny.
Another model for a redistricting algorithm is the National Football League draft, where the team with the worst win- loss record gets to choose first from the pool of new players. In the redistricting version of the draft, the district with the smallest population chooses from the pool of unclaimed counties, always picking the highest-population county contiguous to its own territory. (In the argot of computer science this is a "greedy algorithm.") Again, details such as the handling of ties need careful attention. Also, getting the procedure started is a potential trouble spot. The obvious strategy is to begin with empty districts, so that the k largest counties will be selected as the "nuclei" of k Congressional districts. This happens to work reasonably well in North Carolina, but it is easy to imagine pathological cases where all the largest counties are in a tight cluster, producing severely deformed districts.
A glance at the output of either the cake-cutting or the greedy algorithm shows immediate opportunities for improving the map. Shifting or swapping selected counties would help to equalize populations or make the districts more compact. This suggests adding a post-processing stage to optimize the alignments. The Kaiser-Nagel algorithm is just such a procedure, and so I wrote a program based on similar principles, using population equality as the function to be optimized. Single-county moves were attempted repeatedly, in a specific sequence, until the system reached a stable state, where no further move would improve the situation. Applied to the output of the greedy algorithm, the program reduced the mean deviation from 10.1 percent to 4.4 percent.
As noted above, the Kaiser-Nagel procedure is a steepest- descent method, and therefore vulnerable to getting trapped in a local optimum. A technique called simulated annealing can overcome this drawback. The idea is to accept not only all moves that improve population equality but also a few randomly selected detrimental moves. In other words, when a county is shifted from one district to another, the move is always accepted if it yields better population balance, and in addition it may sometimes be accepted even if it makes the balance worse. The probability of accepting an unfavorable move is determined by a parameter analogous to a temperature, and the system is "annealed" by steadily reducing the temperature toward zero, where only favorable moves are possible. The effect is to jiggle the system out of premature local optima, increasing the chance that it will eventually find the global optimum.
In my first attempt at an annealing program I was again caught off-guard by contiguity problems. I had not foreseen something that now seems obvious: that giving away a county can leave a district in two or more disconnected pieces. When this bug was corrected, the program worked quite well. The best run in a series of experiments with various annealing schedules produced a mean population deviation of under 1 percent. But this improvement comes at a high price: The algorithm is no longer deterministic. Because the program has an element of randomness, repeated runs on the same input data yield different results. This is a loophole the gerrymanderer could exploit, running the program again and again until the result looks "right."