COMPUTING SCIENCE

# Third Base

# Martha Stewart's File Cabinet

Some weeks ago, rooting around in files of old clippings and correspondence, I made a discovery of astonishing obviousness and triviality. What I found had nothing to do with the content of the files; it was about their arrangement in the drawer.

Imagine a fastidious office worker—a Martha Stewart of filing—who insists that no file folder lurk in the shadow of another. The protruding tabs on the folders must be arranged so that adjacent folders always have tabs in different positions. Achieving this staggered arrangement is easy if you're setting up a new file, but it gets messy when folders are added or deleted at random.

A drawer filled with "half-cut" folders, which have just two tab positions, might initially alternate *left-right-left-right*. The pattern is spoiled, however, as soon as you insert a folder in the middle of the drawer. No matter which type of folder you choose and no matter where you put it (except at the very ends of the sequence), every such insertion generates a conflict. Removing a folder has the same effect. Translated into a binary numeral with *left*=0 and *right*=1, the pristine file is the alternating sequence ...0101010101.... An insertion or deletion creates either a 00 or a 11—a flaw much like a dislocation in a crystal. Although in principle the flaw could be repaired—either by introducing a second flaw of the opposite polarity or by flipping all the bits between the site of the flaw and the end of the sequence—even the most maniacally tidy record-keeper is unlikely to adopt such practices in a real file drawer.

In my own files I use third-cut rather than half-cut folders; the tabs appear in three positions, *left*, *middle* and *right*. Nevertheless, I had long thought—or rather I had assumed without bothering to think—that a similar analysis would apply, and that I couldn't be sure of avoiding conflicts between adjacent folders unless I was willing to shift files to new folders after every insertion. Then came my Epiphany of the File Cabinet a few weeks ago: Suddenly I understood that going from half-cut to third-cut folders makes all the difference.

It's easy to see why; just interpret the drawerful of third-cut folders as a sequence of ternary digits. At any position in any such sequence, you can always insert a new digit that differs from both of its neighbors. Base 3 is the smallest base that has this property. Moreover, if you build up a ternary sequence by consistently inserting digits that avoid conflicts, then the choice of which symbol to insert is always a forced one; you never have to make an arbitrary selection among two or more legal possibilities. Thus, as a file drawer fills up, it is not only possible to maintain perfect Martha Stewart order; it's actually quite easy.

Deletions, regrettably, are more troublesome than insertions. There is no way to remove arbitrary elements from either a binary or a ternary sequence with a guarantee that two identical digits won't be brought together. (On the other hand, if you're fussy enough to fret about the positions of tabs on file folders, you probably never throw anything away anyhow.)

The protocol for avoiding conflicts between third-cut file folders is so obvious that I assume it must be known to file clerks everywhere. But in half a dozen textbooks on filing—admittedly a small sample of a surprisingly extensive literature—I found no clear statement of the principle.

Strangely enough, my trifling observation about arranging folders in file drawers leads to some mathematics of wider interest. Suppose you seek an arrangement of folders in which you not only avoid putting any two identical tabs next to each other, but you also avoid repeating any longer patterns. This would rule out not only 00 and 11 but also 0101 and 021021. Sequences that have no adjacent repeated patterns of any length are said to be "square free," by analogy to numbers that have no duplicated prime factors.

In binary notation, the one-digit sequences 0 and 1 are obviously square free, and so are 01 and 10 (but not 00 or 11); then among sequences three bits long there are 010 and 101, but none of the other six possibilities is square free. If you now try to create a four-digit square-free binary sequence, you'll find that you're stuck. No such sequences exist.

What about square-free ternary sequences? Try to grow one digit by digit, and you're likely to find your path blocked at some point. For example, you might stumble onto the sequence 0102010, which is square free but cannot be extended without creating a square. Many other ternary sequences also lead to such dead ends. Nevertheless, the Norwegian mathematician Axel Thue proved almost a century ago that unbounded square-free ternary sequences exist, and he gave a method for constructing one. The heart of the algorithm is a set of digit replacement rules: 0 → 12, 1 → 102, 2 → 0. At each stage in the construction of the sequence, the appropriate rule is applied to each digit, and the result becomes the starting point for the next stage. Figure 4 shows a few iterations of this process. Thue showed that if you start with a square-free sequence and keep applying the rules, the sequence will grow without bound and will never contain a square.

More recently, attention has turned to the question of how many ternary sequences are square free. Doron Zeilberger of Rutgers University, in a paper co-authored with his computer Shalosh B. Ekhad, established that among the 3^{n} *n*-digit ternary sequences at least 2^{n/17} are square free. Uwe Grimm of the Universiteit van Amsterdam has tightened this lower bound somewhat; he has also found an upper bound and has counted all the *n*-digit sequences up to *n*=110. It turns out there are 50,499,301,907,904 ways of arranging 110 ternary digits that avoid all repeated patterns. I'll have to choose one of them when I set up my square-free file drawer.

© Brian Hayes

EMAIL TO A FRIEND :