Backgammon. Checkers. Scrabble. And, most famously, chess. One by one, computers are gobbling up the human advantage in games we thought we'd mastered. Now one of the last strongholds of human superiority is falling: our faithful breakfast companion, the crossword puzzle.
Not long ago, after IBM's Deep Blue smoked the world's best chess player in a celebrated rematch, Will Shortz, editor of the venerable New York Times crossword puzzle, said that crosswords were a much more difficult programming challenge than chess. He thought computers were a long way from matching even an average human puzzle-solver. After all, how could a machine—no matter how many lexicons and encyclopedias were fed into it—be taught to decipher the subtleties of wordplay and language synthesis that are so important in solving crossword puzzles? The rules are vague, and language is socially defined, right?
It turns out that a group of computer scientists at Duke University had asked themselves those same questions. Their answer was Proverb, for probabilistic cruciverbalist (a cruciverbalist is a crossword-puzzle player). The brainchild of computer science assistant professor Michael Littman and graduate students Greg Keim and Noam Shazeer, Proverb has grown from a fall 1998 class project, starting from an archive of words Keim had stored to help him solve puzzles, to the subject of a talk at the recent American Association for Artificial Intelligence meeting in Orlando, Florida. It has competed admirably against humans at crossword-puzzle tournaments and even showed its stuff at AAAI.
Courtesy of Littman and Keim, American Scientist got a similar, private demonstration recently at Duke—Proverb versus a puzzle that had run in a recent midweek New York Times.
Keim, interested in how computers can make better use of language, sits in the middle of a large bank of terminals. He has fed the crossword's clues and grid into Proverb. Proverb starts by running the clues by a set of programs that behave like experts on, say, a College Bowl–style trivia panel. There are 30 such "expert modules"—each a huge collection of facts from atlases, music databases, dictionaries you probably haven't heard of such as WordList-Big and on and on, the data donated by its compilers and downloaded from the Internet. Proverb also contains a database called Puzzle Expert, consisting of more than 5,133 previously published crossword puzzles—in excess of 350,000 clues from about 14 years' worth of daily puzzles. Littman notes that, because common vowels help to connect grids, some words show up more than others. A case in point: "Erie," present in 5 percent of puzzles.
Proverb goes to work. A "solver" module mines data in each expert module for as many correct-length suggestions as possible. It relies on both exact match of clues it has seen before and fuzzy-matching methods common in information retrieval. Proverb marks each candidate answer with a confidence score. This part of the puzzle, requiring the muscle of 14 linked computers, lasts less than five minutes.
Now another program takes over, the grid-filler. It's showtime. Using as many high-confidence words as it can, the grid-filler attempts to squeeze the solver's suggestions into the spaces available. The bane of any crossword-player's existence is making the letters match up, and this is no different for Proverb. You can see on the computer screen that, like many an uncertain human cruciverbalist, Proverb will mark squares lightly until its confidence grows that the squares match up with those of its neighboring squares. Only then will it darken them in. Pink shading indicates a wrong guess. This grid-filling process lasts about 10 minutes.
In under 15 minutes total, Proverb has nailed all but two letters and 73 of 76 words, in line with results from a recent test over 370 puzzles, from The New York Times, Los Angeles Times, USA Today, TV Guide and several other sources. Proverb averaged 95.3 percent words correct and 98.1 percent letters correct. It was perfect on nearly half the puzzles. On this day, Proverb shows that it is, after all, only nonhuman. It has trouble with 6 Across ("grocery list abbr.") and 8 Down ("goose egg"), which throws off 16 Across ("1/60-minute drink?"). For 6 Across, Proverb, perhaps exposing a computer bias, opts for "dos" over the correct "doz," rendering 8 Down ("zip") "sit" (which makes a certain sense) and the end of 16 Down ("secondcup") "cut."
Although Proverb would need to perform flawlessly in faster than five minutes to be tournament competitive, its speed and accuracy would blow away most mortals. Of course, human players still retain one important advantage: If while filling in the grids you should spill coffee on yourself, you can still go to work.—William J. Cannon