COMPUTING SCIENCE

# Sorting Out the Genome

To put your genes in order, flip them like pancakes

# The Happy Ending

The algorithm published by Hannenhalli and Pevzner finds the optimum sequence of reversals in an amount of time proportional to *n*^{4}. Others soon improved this bound to *n* ^{3} and *n*^{2}. Eric Tannier and Marie-France Sagot of the Université Claude Bernard in France apparently hold the current record:

If you simply want to know the length of the shortest sequence of reversals (without seeing the sequence itself), there is a linear method, taking time proportional to the first power of *n*. Even the slowest of these algorithms is immeasurably more efficient than an exponential method, and they can all cope with any realistic biological data.

Several authors have implemented reversal-sorting algorithms in publicly available software. Of particular note is a Web site called GRIMM, created by Glenn Tesler of the University of California, San Diego, which calculates the optimum sequence of reversals for any pair of signed permutations. On a lighter note, the online version of this column includes an interactive illustration that will allow you to try your own hand at finding an optimum series of reversals for small random permutations.

The decade-long effort to solve the sorting-by-reversals problem stands as a model of interdisciplinary collaboration. The question that inspired the work in the first place was of genuine importance to biology, but it also turned out to hold real interest for mathematicians and computer scientists, who might well have tackled the problem even if it didn't have an appealing application. I suspect that some of the competitive fine-tuning of algorithms goes beyond the actual needs of biologists, but those whose main interests are computational deserve to have their fun too.

In any case, there are plenty of biological challenges that remain to be mastered. The abstract version of the sorting problem implicitly assumes that all segments of a chromosome are equally likely to be reversed. Biologists know that there are actually biases favoring some segments, or some junctions between segments. Furthermore, reversals of chromosomal segments are not the only mechanism for mixing up the genome. In organisms with multiple chromosomes, there are also transpositions, fissions and fusions to be taken into account. Genes are duplicated or deleted. DNA is not, in the end, just a permutation of signed numbers, and yet it's impressive how much can be gleaned from such simple models.

# Gene-Flipping Simulator

You can try gene flipping in the safety of your own home with Brian Hayes's gene-flipper program extraordinaire. To ensure that it works properly, you may need to download the latest version of the Adobe Flash Player.

© Brian Hayes

# Bibliography

- Bader, David A., Bernard M. E. Moret and Mi Yan. 2001. A linear-time algorithm for computing inversion distance between signed permutations with an experimental study.
*Proceedings of the Seventh Workshop on Algorithms and Data Structures*(WADS '01), pp. 365-376. Berlin: Springer-Verlag. - Bafna, Vineet, and Pavel Pevzner. 1996. Genome rearrangements and sorting by reversals.
*SIAM Journal on Computing*25:272-289. - Bartolomé, Carolina, and Brian Charlesworth. 2006. Rates and patterns of chromosomal evolution in
*Drosophila pseudoobscura*and*D. miranda*.*Genetics*173:779-791. - Bergeron, Anne. 2005. A very elementary presentation of the Hannenhalli-Pevzner theory.
*Discrete Applied Mathematics*146:134-145. - Bergeron, Anne, and François Strasbourg. 2001. Experiments in computing sequences of reversals.
*Proceedings of the First International Workshop on Algorithms in Bioinformatics*, pp. 164-174. Berlin: Springer-Verlag. - Berman, Piotr, and Sridhar Hannenhalli. 1996. Fast sorting by reversal.
*Proceedings of the Seventh Symposium on Combinatorial Pattern Matching*, pp. 168-185. Berlin: Springer-Verlag. - Caprara, Alberto. 1997. Sorting by reversals is difficult.
*Proceedings of RECOMB '97: The First International Conference on Computational Molecular Biology*, pp. 75-83. New York: ACM Press. - Dobzhansky, Th., and A. H. Sturtevant. 1938. Inversions in the chromosomes of
*Drosophila pseudoobscura*.*Genetics*23:28-64. - Gates, William H., and Christos H. Papadimitriou. 1979. Bounds for sorting by prefix reversal.
*Discrete Mathematics*27:47-57. - Hannenhalli, Sridhar, and Pavel A. Pevzner. 1999. Transforming cabbage into turnip: polynomial algorithm for sorting signed permutations by reversals.
*Journal of the ACM*48:1-27. - Heydari, Mohammad H., and I. Hal Sudborough. 1997. On the diameter of the pancake network.
*Journal of Algorithms*25:67-94. - Kaplan, Haim, Ron Shamir and Robert E. Tarjan. 1997. Faster and simpler algorithm for sorting signed permutations by reversals.
*Proceedings of the Eighth ACM-SIAM Symposium on Discrete Algorithms*, pp. 344-351. New York: ACM Press. - Kaplan, Haim, and Elad Verbin. 2005. Sorting signed permutations by reversals, revisited.
*Journal of Computer and System Sciences*70:321-341. - Kececioglu, J. D., and D. Sankoff. 1995. Exact and approximation algorithms for sorting by reversal, with application to genome rearrangement.
*Algorithmica*13:180-210. - Palmer, Jeffrey D., and Laura A. Herbon. 1988. Plant mitochondrial DNA evolves rapidly in structure, but slowly in sequence.
*Journal of Molecular Evolution*28:87-97. - Pevzner, Pavel A. 2000.
*Computational Molecular Biology: An Algorithmic Approach*. Cambridge, Mass.: MIT Press. - Tannier, Eric, and Marie-France Sagot. 2004. Sorting by reversals in subquadratic time.
*Proceedings of the 15th Annual Symposium on Combinatorial Pattern Matching*, pp. 1-13. Berlin: Springer-Verlag. - Tesler, Glenn. 2002. GRIMM: genome rearrangements web server.
*Bioinformatics*18:492-493. See also http://www-cse.ucsd.edu/groups/bioinformatics/GRIMM/index.html - Watterson, G. A., W. J. Ewens, T. E. Hall and A. Morgan. 1982. The chromosome inversion problem.
*Journal of Theoretical Biology*99:1-7.

EMAIL TO A FRIEND :