MY AMERICAN SCIENTIST
SEARCH

HOME > PAST ISSUE > March-April 2007 > Article Detail

COMPUTING SCIENCE

# Trains of Thought

Computing with locomotives and box cars takes a one-track mind

Solitaire Sorting

The second phase of the freight-car sorting process—putting the cars in order for delivery—is typically done in smaller, local switching yards. These are humpless "flat yards"; the cars are moved by engines rather than gravity. The tracks can be arranged as queues, as in a hump yard, or as stacks—dead-end leads—so that cars have to be pushed in and pulled out from the same end. Suppose a train has cars numbered 1 through n, but on arrival at the yard they are scrambled in some arbitrary order; the departing train should have the cars in ascending sequence, with car 1 just behind the engine and car n at the end. How many classification tracks are needed to achieve this result? How many times do cars have to be pushed onto and pulled out of the tracks? These are questions of obvious practical importance to railroaders. They are also questions that yield to mathematical and algorithmic analysis.

Robert Tarjan of Stanford University answered some of the questions in 1972. Here are a few of his results:

If a switch yard has an internal loop, allowing cars at the output to be brought back to the input for further processing, then any sequence can be sorted. Conversely, in the absence of such loops, no finite network of stacks, queues or deques can sort all possible sequences, even if the individual storage elements are of unbounded capacity.

If the yard consists of m queues arranged in parallel, then a train can be fully sorted if and only if the longest decreasing subsequence has no more than m cars. (The cars of a decreasing subsequence don't have to be consecutive; for example, in the sequence 1635492 the longest decreasing subsequence is 6542.) For a yard with m parallel stacks, it's the longest increasing subsequence that governs. But these constraints are somewhat artificial. They apply only if cars must always move from the input to a stack or a queue and then directly to the output. Real rail yards are more flexible; cars can be pulled from one stack and pushed onto another. When moves like this are allowed, it's harder to determine which sequences can be sorted.

In 2000 Elias Dahlhaus, Peter Horak, Mirka Miller and Joseph F. Ryan showed that a version of the switch-yard problem is NP-complete (which means, roughly speaking, that there's no efficient algorithm for solving it). Specifically, they proved it is difficult to decide how many tracks are needed.

Chinese mathematicians have taken a somewhat different and more-pragmatic approach to train-sorting problems, apparently in response to a request from Chinese railroad officials. (The exact provenance of these ideas is somewhat murky. In 1976 an American delegation to China heard a lecture on the subject by Ma Chung-fan; notes on this talk were written up by Henry O. Pollak and published in a National Academy of Sciences report. A 1983 paper by Zhu Yongjin and Zhu Ruopeng covers similar ideas but does not mention Ma.)

Pollak's lecture notes present an example: Use an array of stacks to sort the 10-car sequence 6324135726. (Cars with the same number are going to the same destination and thus should be grouped together.) Here I am going to consider the same example but look at it from a different point of view.

One approach to sorting the example sequence resembles a game of solitaire, building multiple stacks of cars in nondecreasing order. Working from the rear of the train toward the front, we examine the number on each car and push the car onto a stack. Suppose the car we have just reached bears number k. If there is exactly one nonempty stack whose topmost element is greater than or equal to k, then we put the car on that stack. If there are multiple stacks with a top element greater than or equal to k, we choose the stack with the smallest top entry. If no stack qualifies to receive car k, then we have to start a new stack.

For the sequence 6324135726, we begin with the rightmost 6, which necessarily starts a new stack. We push 2 onto the same stack, but the 7 starts a second stack, which can also accept 5 and 3. The 1 then goes on the first stack, and the 4 inaugurates a third stack. Working through the rest of the sequence, we finally reach this configuration of four stacks:

Now the cars can be pulled out of the stacks in nondecreasing order; following the guidelines indicated by the colored blocks, this final assembly step will take seven "pulls." The sorted sequence, of course, is 1223345667.

In his Beijing lecture, Ma gave an alternative sorting procedure; I'm going to call it the Chinese solitaire algorithm. It partitions the sequence in a way that requires just four pulls to assemble the sorted train. Here is the final state of the four stacks:

It's easy to confirm that this configuration can be reached from the original train order, and that four pulls do indeed yield the properly sorted sequence. But by what rule were the numbers dealt into these particular groups? Both the notes on Ma's lecture and the paper by Zhu and Zhu give a rather convoluted algorithm. In trying to explain it I can do no better than quote the lecture notes:

Start at the leftmost (in this case the only) 1, put down all 1s, all 2s to the right of the last 1, 3s to the right of the last 2 if you have covered all the 2s, etc. In this case, the first subset defined in this way is 12.... The next subset takes the other 2 and the second 3...; it can't get to the first 3. The next subset takes the first 3, the 4, the 5, and second 6; the last subset is 67.

This procedure works, but there's an easier way to generate the same partitioning: Repeatedly scan from left to right, and on each pass extract the longest possible nondecreasing subsequence starting with the leftmost number. In the example considered here, the first such subsequence is 67, followed by 3456, then 23 and finally 12.

Zhu and Zhu give a proof that the Chinese solitaire algorithm allows the train to be assembled with the minimal number of pulls from the classification tracks. But the proof counts only pulls. What about "pushes"—the train movements needed to place the cars on the stacks in the first place? For the example sequence, the Chinese algorithm has the worst possible performance in this respect: Ten separate pushes are needed to stack up the 10 cars. The non-Chinese solitaire method is somewhat better, at seven pushes. Taking the sum of pushes and pulls, the two methods score a tie at 14. I don't know whether some other technique can do better.