COMPUTING SCIENCE

# Trains of Thought

Computing with locomotives and box cars takes a one-track mind

Solitaire Sorting

The second phase of the freight-car sorting process—putting
the cars in order for delivery—is typically done in smaller,
local switching yards. These are humpless "flat yards";
the cars are moved by engines rather than gravity. The tracks can be
arranged as queues, as in a hump yard, or as stacks—dead-end
leads—so that cars have to be pushed in and pulled out from
the same end. Suppose a train has cars numbered 1 through
*n*, but on arrival at the yard they are scrambled in some
arbitrary order; the departing train should have the cars in
ascending sequence, with car 1 just behind the engine and car
*n* at the end. How many classification tracks are needed to
achieve this result? How many times do cars have to be pushed onto
and pulled out of the tracks? These are questions of obvious
practical importance to railroaders. They are also questions that
yield to mathematical and algorithmic analysis.

Robert Tarjan of Stanford University answered some of the questions in 1972. Here are a few of his results:

If a switch yard has an internal loop, allowing cars at the output
to be brought back to the input for further processing, then
*any* sequence can be sorted. Conversely, in the absence of
such loops, no finite network of stacks, queues or deques can sort
all possible sequences, even if the individual storage elements are
of unbounded capacity.

If the yard consists of *m* queues arranged in parallel, then
a train can be fully sorted if and only if the longest decreasing
subsequence has no more than *m* cars. (The cars of a
decreasing subsequence don't have to be consecutive; for example, in
the sequence 1635492 the longest decreasing subsequence is 6542.)
For a yard with *m* parallel stacks, it's the longest
*increasing* subsequence that governs. But these constraints
are somewhat artificial. They apply only if cars must always move
from the input to a stack or a queue and then directly to the
output. Real rail yards are more flexible; cars can be pulled from
one stack and pushed onto another. When moves like this are allowed,
it's harder to determine which sequences can be sorted.

In 2000 Elias Dahlhaus, Peter Horak, Mirka Miller and Joseph F. Ryan showed that a version of the switch-yard problem is NP-complete (which means, roughly speaking, that there's no efficient algorithm for solving it). Specifically, they proved it is difficult to decide how many tracks are needed.

Chinese mathematicians have taken a somewhat different and more-pragmatic approach to train-sorting problems, apparently in response to a request from Chinese railroad officials. (The exact provenance of these ideas is somewhat murky. In 1976 an American delegation to China heard a lecture on the subject by Ma Chung-fan; notes on this talk were written up by Henry O. Pollak and published in a National Academy of Sciences report. A 1983 paper by Zhu Yongjin and Zhu Ruopeng covers similar ideas but does not mention Ma.)

Pollak's lecture notes present an example: Use an array of stacks to sort the 10-car sequence 6324135726. (Cars with the same number are going to the same destination and thus should be grouped together.) Here I am going to consider the same example but look at it from a different point of view.

One approach to sorting the example sequence resembles a game of
solitaire, building multiple stacks of cars in nondecreasing order.
Working from the rear of the train toward the front, we examine the
number on each car and push the car onto a stack. Suppose the car we
have just reached bears number *k*. If there is exactly one
nonempty stack whose topmost element is greater than or equal to
*k*, then we put the car on that stack. If there are multiple
stacks with a top element greater than or equal to *k*, we
choose the stack with the smallest top entry. If no stack qualifies
to receive car *k*, then we have to start a new stack.

For the sequence 6324135726, we begin with the rightmost 6, which necessarily starts a new stack. We push 2 onto the same stack, but the 7 starts a second stack, which can also accept 5 and 3. The 1 then goes on the first stack, and the 4 inaugurates a third stack. Working through the rest of the sequence, we finally reach this configuration of four stacks:

Now the cars can be pulled out of the stacks in nondecreasing order; following the guidelines indicated by the colored blocks, this final assembly step will take seven "pulls." The sorted sequence, of course, is 1223345667.

In his Beijing lecture, Ma gave an alternative sorting procedure; I'm going to call it the Chinese solitaire algorithm. It partitions the sequence in a way that requires just four pulls to assemble the sorted train. Here is the final state of the four stacks:

It's easy to confirm that this configuration can be reached from the original train order, and that four pulls do indeed yield the properly sorted sequence. But by what rule were the numbers dealt into these particular groups? Both the notes on Ma's lecture and the paper by Zhu and Zhu give a rather convoluted algorithm. In trying to explain it I can do no better than quote the lecture notes:

Start at the leftmost (in this case the only) 1, put down all 1s,
all 2s to the right of the last 1, 3s to the right of the last 2 if
you have covered *all* the 2s, etc. In this case, the first
subset defined in this way is 12.... The next subset takes the other
2 and the second 3...; it can't get to the first 3. The next subset
takes the first 3, the 4, the 5, and second 6; the last subset is 67.

This procedure works, but there's an easier way to generate the same partitioning: Repeatedly scan from left to right, and on each pass extract the longest possible nondecreasing subsequence starting with the leftmost number. In the example considered here, the first such subsequence is 67, followed by 3456, then 23 and finally 12.

Zhu and Zhu give a proof that the Chinese solitaire algorithm allows
the train to be assembled with the minimal number of pulls from the
classification tracks. But the proof counts *only* pulls.
What about "pushes"—the train movements needed to
place the cars on the stacks in the first place? For the example
sequence, the Chinese algorithm has the worst possible performance
in this respect: Ten separate pushes are needed to stack up the 10
cars. The *non*-Chinese solitaire method is somewhat better,
at seven pushes. Taking the sum of pushes and pulls, the two methods
score a tie at 14. I don't know whether some other technique can do better.

**IN THIS SECTION**

Community Guidelines: Disqus Comments

The Art and Science of Communicating Science

EMAIL TO A FRIEND :

**Of Possible Interest**

**Feature Article**: The Statistical Crisis in Science

**Computing Science**: Clarity in Climate Modeling

**Engineering**: Anonymous Design