The Web of Words

Brian Hayes

A dictionary is more than a book of definitions; it is an index to a language, imposing an order on our inventory of words. Likewise a thesaurus is a table of contents, which takes the same stock of words but organizes them thematically rather than alphabetically. Both kinds of books reveal something about the underlying structure of the lexicon (the set of all words that make up a language). That structure is what mathematicians call a graph—a collection of "nodes" connected by "edges," usually drawn as a web of dots and lines.

When a language is viewed as a mathematical graph, the nodes are words (or sets of words), and the edges are relations between them. Any dictionary will help you to walk from node to node through the graph. For example, in defining the word elegant, the American Heritage Dictionary offers delicate as a synonym; on looking up delicate, you find dainty among the meanings listed; dainty in turn leads you to the entry for exquisite; and among the meanings of exquisite is elegant again. In this way you trace out one of many loops, or cycles, within the graph defined by this particular dictionary.

Exploring small regions of a lexical graph is a familiar process; you do it mentally whenever you grope for the right word. But trying to construct a graph for an entire language is another matter entirely. English has well over 100,000 words, and they are related to one another in dozens or perhaps hundreds of ways. Finding and recording all the connections is a task on the same scale as compiling a large dictionary. Furthermore, it has to be done with great precision and consistency, because the goal is to create a mathematical structure in which the relations between words are so explicit that the graph can be explored and manipulated algorithmically.

The construction of a lexical graph for English has been under way for almost 15 years in a project called WordNet, which now includes some 168,000 words and 345,000 relations among them. WordNet is the work of George A. Miller and his colleagues in the Cognitive Science Laboratory at Princeton University. (Other contributors are Christiane Fellbaum, Randee I. Tengi and the late Katherine J. Miller.) A book describing WordNet and its applications has recently been published, and the database that defines the lexical graph is available via the Internet and on a CD-ROM, along with software for browsing the graph.

