COMPUTING SCIENCE

# Sorting Out the Genome

To put your genes in order, flip them like pancakes

# Ordering Breakfast

Before getting tangled up in loopy strands of DNA, consider a warm-up exercise: flipping pancakes. You are given a stack of *n* pancakes, no two the same diameter, and asked to sort them in order of size, with the smallest on top and the largest on the bottom. At each step in the sorting process, you insert a spatula anywhere you choose within the stack, then flip over all the pancakes above the spatula. No other manipulations of the stack are allowed. How many flips are required to get the pancakes in order?

Here's one pancake-sorting algorithm. First find the largest pancake, put the spatula under it, and turn over the part of the stack above that point. Now the largest pancake is on top, so flipping the entire stack of *n* pancakes puts it in its proper position at the bottom. From here on, the bottommost pancake will never be moved again. Next, choose the second-largest pancake, make a flip to bring it to the top, and then flip *n* - 1 pancakes to place the second-largest in its permanent place atop the largest. After bringing the third-largest to the top, flip *n* - 2 pancakes, and so on until the whole stack is sorted. Placing each pancake takes two flips, and so the entire procedure requires 2*n* steps. (Or maybe 2*n* - 1. Or 2*n -* 3. Some of the final flips are not really needed.)

This algorithm provides a naive upper bound: It shows that sorting the stack can't require more than about 2*n* steps, but the possibility remains that some other method could do better. In the 1970s such an improved algorithm was devised by Christos H. Papadimitriou, then of Harvard University, and William H. Gates, an undergraduate at that institution (who soon dropped out to start a software company). The Gates-Papadimitriou algorithm sorts any stack of pancakes in (5*n*+5)/3 moves, which for large *n* is essentially equal to 5/3 *n*. Gates and Papadimitriou also established a lower bound, showing that some pancake stacks cannot be sorted in fewer than 17/16 *n* steps; the lower bound has since been raised slightly to 15/14 *n*. The range defined by these upper and lower bounds is probably narrow enough to cope with any practical problems arising at the pancake griddle. From a mathematical point of view, however, the pancake problem remains unsolved: The exact number of flips needed to sort *n* pancakes is unknown.

Gene flipping has a lot in common with the pancake problem, but there are differences as well. With pancakes, you are not allowed to reach into the middle of the stack and change the internal order; you can only reverse subsequences of pancakes at the top of the pile. In the case of genes lined up along a chromosome, there's no reason to enforce such a constraint. Any block of genes can be inverted, whether it's at either end of the chromosome or in the interior. This additional freedom ought to make permuting genes somewhat easier than flipping flapjacks.

On the other hand, the genetic problem is harder in a different way. Pancake sorting has mostly been treated as a quest for worst-case results. The quantity of primary interest is not the number of flips needed for any particular stack of *n* pancakes but rather the maximum for all possible stacks of height *n*. Solving the biological problem calls for a finer level of detail: A geneticist would like to know the reversal distance between particular gene configurations seen in nature. Knowing the maximum for all *n*-gene chromosomes would not be nearly as helpful.

EMAIL TO A FRIEND :