The Web of Words
WordNet is a product of hand-crafting. Essentially all the words were manually entered into lexicography files, with simple textual markers to indicate the various relations between synsets. For example, the character @ denotes a hypernym, ~ is for hyponyms and ! marks antonyms. The entire structure of the lexical graph is implicit in these files, but the relations are not very readily accessible in this form. The files are therefore compiled with a program called the Grinder, which produces a database in which each relation is encoded as a position, or offset, within a file. Long lists of numerical offsets are not very congenial for human readers, but they are easily traversed by a computer program.
WordNet is supplied with a point-and-click browser interface. As with an ordinary on-line dictionary or thesaurus, you can type in a word and get a listing of synsets and an explanatory gloss for all the word's senses, but this overview is only the beginning of what the browser offers. Depending on the part of speech, you can then climb up or down in a lexical tree, search for coordinate terms (siblings at the same level of the tree), find antonyms, rank synonyms by their frequency or similarity, list meronyms, and so on.
The browser serves well for casual exploration of WordNet, but more serious work with the lexical graph generally requires writing specialized software to read the database files. Several such projects have been undertaken both within the Cognitive Science Laboratory at Princeton and elsewhere. Philip Resnik's work on selection rules is mentioned above. Another major endeavor is a series of "semantic concordances"—texts linked to the lexicon in such a way that every substantive word is tagged with the appropriate sense from the WordNet database. One of the texts tagged in this way is Stephen Crane's novella The Red Badge of Courage.
Still another application, described by Ellen M. Voorhees of the National Institute of Standards and Technology, uses WordNet to improve the accuracy of text retrieval from document databases. By encoding the content of a document in terms of WordNet synsets instead of individual words, a query can specify particular senses of words. If this approach succeeds, it might solve the problem of searching the Web for coke and finding soft drinks and illicit drugs when what you're looking for is carbon.
In my own view, the construction of WordNet would be of interest even if the finished product had no applications at all. It is a linguistic counterpart to the human genome program. Just as we all inherit DNA but cannot ordinarily peer into our own genes, we all possess some mental representation of a lexical graph but know little about its overall structure.
» Post Comment