MY AMERICAN SCIENTIST
SEARCH

HOME > PAST ISSUE > Article Detail

COMPUTING SCIENCE

# Sorting Out the Genome

To put your genes in order, flip them like pancakes

# Signs of Change

If a problem is too hard to solve, do you look for a simpler one? In this case the response was just the opposite. Attention turned to a more elaborate variant of the sorting-by-reversals problem. And, surprisingly, efficient solutions to this variant soon emerged. Furthermore, as a bonus, the variant problem offers a more faithful model of the biological process by which genes get jumbled.

To explain this strange twist in the plot, I need to introduce a crucial fact about genetic information that I have neglected mentioning up to now. Each gene on a chromosome has not only a position but also an orientation; think of a gene as an arrow rather than a line segment. Like the letters in a word of text, the sequence of nucleotides in a strand of DNA reads correctly in only one direction. Thus when two genomes are compared, it's necessary to match both the order of the genes along the chromosome and the direction of each gene. Early methods of mapping genes gave no information about their direction, and so the issue was ignored. With instruments that can read nucleotide sequences directly from a molecule of DNA, data on gene orientation is becoming more readily available.

A chromosome with oriented genes corresponds to a signed permutation, one where each element in the set 1 2 3...n has either a plus or a minus sign. Genes facing one way are positive, those with the opposite direction are negative. Reversing a segment of the chromosome inverts the order of the genes and also changes all their signs. For example, the signed permutation +3 -1 -2 +4 has the inverse -4 +2 +1 -3. To sort a signed permutation, you find a series of reversals that arranges the genes in ascending order and makes all the signs positive. Going back to the breakfast table again, the corresponding task is known as the burnt pancake problem: The pancakes have to be stacked in order of size with the blackened sides down.

Common sense argues that signed permutations must surely be harder to sort than unsigned ones. There are more constraints and more variables. The trouble with this reasoning is not that it gives the wrong answer but that it answers the wrong question. In general, sorting a signed permutation does take more work than sorting an unsigned one—but the efficiency of sorting is not at issue. What's wanted is not a sorting of the genome but a measure of the number of reversals needed to sort it, and that is indeed easier to calculate with a signed permutation.