Logo IMG


Qwerks of History

Brian Hayes


My candidate for the most important single innovation introduced by Unix is the hierarchical file system. I also consider it the feature most desperately in need of a better idea.

In the 1950s, a computer file system was a cabinet full of magnetic tapes, tended by a poor drudge who retrieved them as needed. Disk storage allowed information to be kept on-line without manual intervention, but the early disks were small enough that organizing files was not a serious problem. Nevertheless, Ritchie and Thompson foresaw that a simple, flat list of files would soon become unwieldy. Their solution, a tree of directories nested inside directories, has held up quite well for 35 years. (Incidentally, this phylogeny was recapitulated twice in the ontogeny of microcomputers. The first versions of both MS-DOS and the Macintosh operating system had no nested subdirectories; they were flat file systems. In both cases tree-structured directories were added in the next release.)

The Unix file system has the topology of a rooted tree, a structure made up of linked nodes called parents and children. The root of the tree—which by qwerky tradition is always placed at the top—is a special node that has no parent. Directly below the root are its children, which can have children of their own, and so on to arbitrary depth. Any node can have any number of children (including zero), but every node other than the root has exactly one parent. Because of this single-parent constraint, there is always a unique path from the root to any node of the tree. In other words, you can find any directory or file by starting at the root and following some path from parent to child—and there will be only one such path.

By now both the pleasures and the frustrations of directory trees are familiar to all. When you organize your correspondence, you can do it chronologically, setting up a directory for each year, with nested directories for each month. Or instead you can create a directory for each recipient. Or you can invent topical categories—love letters, crank letters, letters to the editor. The trouble is, any one such scheme precludes all the others. If you arrange the files topically, you can't also keep them in chronological sequence. (A mechanism called a symbolic link can create shortcuts between distant nodes of the tree, but it's not a practical solution to this problem.)

The great advantage of trees as a data structure is efficiency of access. Suppose you are searching for a specific document among N files. Going from file to file through a flat, unstructured list, the effort required is proportional to N. With a tree of directories, the effort is reduced to the logarithm of N—a tremendous gain. But for some value of N, even log N grows unreasonably large.

I recently installed a new implementation of the TeX typesetting system (another magnificently qwerky artifact, although it does not go back quite as far as 1969). The system includes more than 13,000 files; a small part of the tree is shown in Figure 3. Note that for the program to work, it is not enough that all the files be present; they must also be in the right places. The hard work of creating this structure was done by an automated installer that goes out over the Internet, finds the necessary components, and puts them where they belong. I could never have constructed it by hand. If it breaks, I have no idea how to fix it.

I take my own helplessness in this situation as a sign that trees may be nearing their practical limit as an organizational device. For a glimpse of what could lie ahead, look at the third notable technology among the 1969 alumni—the Internet, and specifically the World Wide Web. The topology of the Web is uncannily like that of a Unix file system. Internet domain names take the form of a tree (or rather a small copse of trees, since the top-level domains .com, .org and so on are independent roots). Hyperlinks between Web pages are equivalent to symbolic links to Unix files. The only difference is that the Web is several orders of magnitude larger. So as personal file systems continue to grow, perhaps we will manage them with the same kinds of tools we now use for the Web. Indeed, it is widely assumed that the Web browser will be the model for the next generation of operating system. This is a prospect I do not find reassuring. The Web is a wonderful resource, but few of us would view it as a reliable storage-and-retrieval medium, where you can put something in and count on getting it out again. I'd say it's more like a fishing hole, where you throw your hook in and hope that something bites it. Don't be surprised if you see the error message "Go fish!"

comments powered by Disqus


Of Possible Interest

Computing Science: Belles lettres Meets Big Data

Technologue: Quantum Randomness

Technologue: The Quest for Randomness

Subscribe to American Scientist