Sorting Out the Genome
To put your genes in order, flip them like pancakes
The idea that overturned everyone's thinking about reversal sorting came from Pavel A. Pevzner of the University of California, San Diego, working in collaboration first with Vineet Bafna, also of San Diego, and then with Sridhar Hannenhalli of the University of Pennsylvania. They presented their sorting scheme in the language of mathematical graph theory—the science of dots and lines. The illustrations[on page 389] show the construction of some of the graphs that enter into the sorting method. The process has its intricacies, including the replacement of a signed permutation of the numbers 1 through n by an unsigned permutation of 1 through 2n.
Recently Anne Bergeron of the Université du Québec à Montréal has given an alternative explanation of the same algorithm, presenting it more directly in terms of the original signed permutation. The account that follows is based on Bergeron's work.
The key concept is that of an "oriented pair": two numbers, found anywhere within the permutation, that have opposite signs but are consecutive when you ignore the signs. Bergeron observes that a reversal that brings these numbers together will also give them like signs. The process is easier to understand with an example. Consider the six-element permutation 0+3+1+6+5-2+4+7 (shown flanked by the "anchoring" elements 0 and 7, which will never move). The permutation has two oriented pairs, namely +3-2 and +1-2, and each of these can be reunited in two ways. For example, reversing the segment +6+5-2 brings the 1 and the 2 together in their correct order and leaves both with a plus sign, whereas reversing +3+1+6+5 creates the subsequence -3-2.
Bergeron derives deterministic rules for identifying oriented pairs and choosing which of the pairs to reunite next. The rules favor reversals that leave positive numbers in ascending order and negative numbers in descending order. They also give precedence to whichever reversal creates the largest number of new oriented pairs. In this case, the +3+1+6+5 reversal scores highest, generating four oriented pairs. The rules can now be applied again to choose another optimal reversal. Continuing in this way brings the original permutation to canonical order in five steps:
This is the shortest possible reversal-sorting sequence.
The procedure explained by Bergeron is guaranteed to reach a permutation in which all elements have plus signs. Usually, that permutation is the canonical one, and so the task of sorting is completed. However, there are exceptional permutations for which the algorithm fails, and further steps are needed. This is not an artefact of Bergeron's simplified version; the problem is with the underlying algorithm and can be traced to subtle features of the graphs that describe the permutations. Pevzner and his colleagues named these disruptive structures hurdles, and there are even rarer impediments known as superhurdles and fortresses. In an early version of the algorithm the hurdles could not be overcome. Hannenhalli and Pevzner later found a series of intricate maneuvers that can clear all the varieties of hurdles, and so the measurement of reversal distance is now exact.