COMPUTING SCIENCE
Collective Wisdom
Brian Hayes
Success Stories
Factoring. Finding the prime factors of a composite integer is a classic among computationally hard problems. The inverse process—multiplication—has an efficient algorithm we expect children to master, but factoring seems to be intractable when numbers get big. Adding three decimal digits to the length of a number doubles the effort needed to factor it.
But if factoring is hard, it is also ideally suited to
parallel computation. Splitting the work among k
computers produces an answer very nearly k times
faster.
The first Internet factoring project was organized in 1988
by Arjen K. Lenstra (now of Citibank) and Mark S. Manasse of the DEC System Research Center in Palo Alto. They and their colleagues had written software to distribute factoring tasks among workstations within the DEC laboratory, and they extended this system so that computers elsewhere could contribute to the effort. The infrastructure was simple: Factoring tasks were parceled out by electronic mail, and results came back the same way.
As early as 1989 Lenstra and Manasse had already given an
astute analysis of the economics of collective computing.
They could get equivalent performance, they estimated, from
300 workstations or 1,200 PCs or a single high-speed machine designed especially for factoring. If they had to buy all the hardware, the last option was clearly the best choice. But if the owners of workstations or PCs could be induced to donate CPU cycles free of charge, that price would be hard to beat.
By 1990 Lenstra and Manasse and about a hundred e-mail
collaborators from around the world were routinely factoring numbers of 100 decimal digits. In 1993 a larger group was assembled to take on a number known as RSA-129. Some 600 volunteers labored for eight months to factor this 129-digit number, winning a prize of $100 for their efforts. Two years later RSA-130 (a digit longer) succumbed to another army of factorers. This time some of the work was coordinated not by e-mail but through a World Wide Web interface designed by James Cowie of Cooperating Systems Corporation.
Primes. A counterpoint to factoring is the search for primes—numbers that cannot be factored. Primes of 100 or 200 decimal digits no longer attract much notice; with the right software any modern computer can find examples of such numbers in a few seconds. The frontier of knowledge for primes is now out near a million digits. Even the
mathematically jaded may marvel at a prime of such
magnitude: Think of all the smaller numbers that
might divide it, but don't.
Most of the largest known primes are Mersenne primes, named
for the 17th-century friar Marin Mersenne. A Mersenne number has the form 2n - 1, but not all such numbers are prime. In the first place, for 2n - 1 to be prime, n itself must be prime, but even that is only a necessary condition, not a sufficient one. (Try n = 11.) To find Mersenne primes, you must first calculate
2n - 1 for each prime value of n, then test the result for primality. Algorithms based on the work of Edouard Lucas and D. H. Lehmer greatly speed up the primality tests.
The two largest known primes were found by participants in
the Great Internet Mersenne Prime Search, or GIMPS. The
founder of this project is George Woltman, a computer
scientist who wrote efficient software for Lucas-Lehmer
primality tests and made it available on a Web site. (You
need not understand the mathematics of the Lucas-Lehmer test to run the software.) Some 4,000 volunteers have contributed to the search so far. In November 1996 Joel Armengaud discovered that 21,398,269 - 1 is prime. Then in August 1997 Gordon Spence proved the primality of 22,976,221 - 1. This number, which has 895,932 decimal digits, is the 36th Mersenne prime to be discovered. (It may not be 36th in the numerical sequence of Mersenne primes, however, because there are exponents less than 2,976,221 that have not yet been checked.)
Code-breaking. The collective-computing projects that have attracted the most participants have been attempts to decipher encrypted messages—but the volunteers are not snooping into anyone's confidential e-mail. RSA Data Security, a company in the secrecy business, has posted a number of cryptographic puzzles, with cash prizes for those who solve them. The company's aim is to test the security of their own products and to demonstrate the vulnerability of encryption schemes they consider inadequate. The factoring of RSA-129 and RSA-130 was part of this program. Other RSA challenges don't involve factoring but call for a direct attack on an encrypted text.
In one challenge the message was encoded with DES, the Data
Encryption Standard, a cipher developed in the 1970s under
U.S. government sponsorship. The key that unlocks a DES
message is a binary number of 56 bits. In general the only
way to crack the code is to try all possible keys, of which
there are 256, or about 7 X 1016. This task was undertaken by an Internet collaboration called DESCHALL, organized by Rocke Verser, Matt Curtin and Justin Dolske. In June of 1997 they got lucky, discovering the correct key after checking just 18 percent of the possibilities, and won the $10,000 prize.
Another RSA challenge also employed a 56-bit key, but with
an encryption algorithm called RC5. Three groups, known as
Bovine, Infinite Monkeys and Cyberian, all began recruiting
volunteers for the RC5 attack. Bovine eventually attracted
the most helpers, and it was they who found the key and
deciphered the message in October 1997, after exhausting 47
percent of the key space, or 34 quadrillion keys. Bovine was organized by Adam L. Beberg, Jeff Lawson and David McNett.
Compared with earlier distributed-computing projects, the
RC5 efforts were not only technically sophisticated but also reached a new level of promotional and motivational
slickness. For example, the Bovine software kept statistics
on the contributions of individuals and teams, adding an
element of competition within the Bovine ranks as well as
between Bovine and the other groups. In the team standings,
Macintosh militants finally prevailed over partisans of the
Linux operating system. By the end of the contest some 4,000 active teams were processing 7 billion keys per second, a rate equivalent to the work of 26,000 Pentium computers.

On completing RC5-56, the Bovine collaboration turned to
RC5-64, a cipher with a 64-bit key. The effort needed to
break this code will be 256 times greater, which suggests it could be a labor of decades. It's worth pausing to ask
whether the brute-force testing of 18,446,744,073,709,551,616 cryptographic keys is really a
better use of resources than displaying pretty fish in an
aquarium. Beberg and his colleagues are considering other
possible projects. Meanwhile, RSA has announced a new
challenge. This time the message is encoded with the same
DES algorithm broken last spring, but the contest rules are
altered to reward speed of decryption. The initial prize of
$10,000 drops to zero after 67.5 days. Bovine has taken up
the challenge.
Golomb rulers. Imagine a six-inch ruler with marks
inscribed not at the usual equal intervals but at 0, 1, 4
and 6 inches. Taking all possible pairs of marks, you can
measure six distinct distances: 1, 2, 3, 4, 5 and 6 inches.
A ruler on which no two pairs of marks measure the same
distance is called a Golomb ruler, after Solomon W. Golomb
of the University of Southern California, who described the
concept 25 years ago. The 0-1-4-6 example is a
perfect Golomb ruler, in that all integer intervals from 1 to the length of the ruler are represented. On rulers with more than four marks, perfection is not possible; the best you can do is an optimal Golomb ruler, which for a given number of marks is the shortest ruler on which no intervals are duplicated.
The world is not panting in desperate need of better Golomb
rulers, and yet these combinatorial curiosities do have
practical uses. For example, in setting up an interferometer for radio astronomy, placing the antennas on the marks of a Golomb ruler maximizes the recovery of information about the phases of the signals received.
Many good Golomb rulers have been found by trial-and-error,
but proving them optimal (or finding a better ruler if one
exists) is computationally expensive. In 1995 a dozen
workstations took four CPU-years to prove there is no 19-mark ruler shorter than 246 units. The computation was done
by Apostolos Dollas, William T. Rankin and David McCracken
of Duke University.
In 1996 David Vanderschel and Mark Garry, who had both
worked on Golomb rulers independently, merged their ideas in a program called GVANT, which turned out to be significantly more efficient than earlier programs. They quickly confirmed the known results for rulers of up to 19 marks, but even with their highly optimized algorithm a search for the best 20-mark ruler was a formidable undertaking. They therefore sought Internet collaborators. With seven helpers it took about half a year to prove that a 20-mark ruler of length 283 is optimal.
In the spring of 1997 Vanderschel and Garry turned to the
21-mark ruler, for which the shortest known arrangement is
333 units long. For this ruler a naive algorithm would have
to check more than 1030 arrangements of the marks; GVANT prunes the number of cases to about 1015. Roughly 100 volunteers have pitched in to help, but after a year's searching there is still plenty of work left for latecomers. Out of 1,200 trillion arrangements to be checked, fewer than 200 trillion have been examined so far.
Aliens. If the prospect of finding a bigger Mersenne prime or a smaller Golomb ruler won't induce you to pledge your CPU, perhaps you would like to discover an
extragalactic civilization. A proposal called SETI@home
would put thousands of computers to work sifting for signs
of life in signals recorded with the radio telescope of the
Arecibo Observatory in Puerto Rico.
A search for extraterrestrial intelligence has been going on at Arecibo for almost two decades. The telescope slowly
scans the sky, detecting emissions over a broad range of
radio wavelengths. A special-purpose signal-processing
computer applies a Fourier transform to the data to pick out narrow-bandwidth signals, which could be the alien
equivalent of "I Love Lucy." The astronomers would like to
put the data through a more thorough analysis, but computing capacity is a bottleneck. That's where SETI@home comes in.
With enough computers on the job, the Arecibo data could be
sliced into finer divisions of bandwidth and frequency.
Moreover, the analysis software could check for other kinds
of regularities, such as signals that pulsate or that
"chirp" through a sequence of frequencies. The task is well
suited to Internet computing in that only small blocks of
data need to be passed back and forth over the network, but
a great deal of computing is needed on each block.
SETI@home is the project of Dan Werthimer of the University
of California at Berkeley and several colleagues. Their
plans are ambitious; they seek a mass audience. The client
software they envision would run as a screen saver, starting up automatically whenever a machine is left idle. As the program churned away on the data analysis, it would also display a series of images related to the project, such as a representation of the data currently being examined or a map showing the progress of the sky survey.
Some 70,000 people have signed up for SETI@home.
Unfortunately, the project is on hold for lack of funding.
» Post Comment