MY AMERICAN SCIENTIST
LOG IN! REGISTER!
SEARCH
 
RSS
Logo IMG
HOME > PAST ISSUE > March-April 1998 > Article Detail

COMPUTING SCIENCE

Collective Wisdom

Brian Hayes

Success Stories

Factoring. Finding the prime factors of a composite integer is a classic among computationally hard problems. The inverse process—multiplication—has an efficient algorithm we expect children to master, but factoring seems to be intractable when numbers get big. Adding three decimal digits to the length of a number doubles the effort needed to factor it.

But if factoring is hard, it is also ideally suited to parallel computation. Splitting the work among k computers produces an answer very nearly k times faster.

The first Internet factoring project was organized in 1988 by Arjen K. Lenstra (now of Citibank) and Mark S. Manasse of the DEC System Research Center in Palo Alto. They and their colleagues had written software to distribute factoring tasks among workstations within the DEC laboratory, and they extended this system so that computers elsewhere could contribute to the effort. The infrastructure was simple: Factoring tasks were parceled out by electronic mail, and results came back the same way.

As early as 1989 Lenstra and Manasse had already given an astute analysis of the economics of collective computing. They could get equivalent performance, they estimated, from 300 workstations or 1,200 PCs or a single high-speed machine designed especially for factoring. If they had to buy all the hardware, the last option was clearly the best choice. But if the owners of workstations or PCs could be induced to donate CPU cycles free of charge, that price would be hard to beat.

By 1990 Lenstra and Manasse and about a hundred e-mail collaborators from around the world were routinely factoring numbers of 100 decimal digits. In 1993 a larger group was assembled to take on a number known as RSA-129. Some 600 volunteers labored for eight months to factor this 129-digit number, winning a prize of $100 for their efforts. Two years later RSA-130 (a digit longer) succumbed to another army of factorers. This time some of the work was coordinated not by e-mail but through a World Wide Web interface designed by James Cowie of Cooperating Systems Corporation.

Primes. A counterpoint to factoring is the search for primes—numbers that cannot be factored. Primes of 100 or 200 decimal digits no longer attract much notice; with the right software any modern computer can find examples of such numbers in a few seconds. The frontier of knowledge for primes is now out near a million digits. Even the mathematically jaded may marvel at a prime of such magnitude: Think of all the smaller numbers that might divide it, but don't.

Most of the largest known primes are Mersenne primes, named for the 17th-century friar Marin Mersenne. A Mersenne number has the form 2n - 1, but not all such numbers are prime. In the first place, for 2n - 1 to be prime, n itself must be prime, but even that is only a necessary condition, not a sufficient one. (Try n = 11.) To find Mersenne primes, you must first calculate 2n - 1 for each prime value of n, then test the result for primality. Algorithms based on the work of Edouard Lucas and D. H. Lehmer greatly speed up the primality tests.

The two largest known primes were found by participants in the Great Internet Mersenne Prime Search, or GIMPS. The founder of this project is George Woltman, a computer scientist who wrote efficient software for Lucas-Lehmer primality tests and made it available on a Web site. (You need not understand the mathematics of the Lucas-Lehmer test to run the software.) Some 4,000 volunteers have contributed to the search so far. In November 1996 Joel Armengaud discovered that 21,398,269 - 1 is prime. Then in August 1997 Gordon Spence proved the primality of 22,976,221 - 1. This number, which has 895,932 decimal digits, is the 36th Mersenne prime to be discovered. (It may not be 36th in the numerical sequence of Mersenne primes, however, because there are exponents less than 2,976,221 that have not yet been checked.)

Code-breaking. The collective-computing projects that have attracted the most participants have been attempts to decipher encrypted messages—but the volunteers are not snooping into anyone's confidential e-mail. RSA Data Security, a company in the secrecy business, has posted a number of cryptographic puzzles, with cash prizes for those who solve them. The company's aim is to test the security of their own products and to demonstrate the vulnerability of encryption schemes they consider inadequate. The factoring of RSA-129 and RSA-130 was part of this program. Other RSA challenges don't involve factoring but call for a direct attack on an encrypted text.

In one challenge the message was encoded with DES, the Data Encryption Standard, a cipher developed in the 1970s under U.S. government sponsorship. The key that unlocks a DES message is a binary number of 56 bits. In general the only way to crack the code is to try all possible keys, of which there are 256, or about 7 X 1016. This task was undertaken by an Internet collaboration called DESCHALL, organized by Rocke Verser, Matt Curtin and Justin Dolske. In June of 1997 they got lucky, discovering the correct key after checking just 18 percent of the possibilities, and won the $10,000 prize.

Another RSA challenge also employed a 56-bit key, but with an encryption algorithm called RC5. Three groups, known as Bovine, Infinite Monkeys and Cyberian, all began recruiting volunteers for the RC5 attack. Bovine eventually attracted the most helpers, and it was they who found the key and deciphered the message in October 1997, after exhausting 47 percent of the key space, or 34 quadrillion keys. Bovine was organized by Adam L. Beberg, Jeff Lawson and David McNett.

Compared with earlier distributed-computing projects, the RC5 efforts were not only technically sophisticated but also reached a new level of promotional and motivational slickness. For example, the Bovine software kept statistics on the contributions of individuals and teams, adding an element of competition within the Bovine ranks as well as between Bovine and the other groups. In the team standings, Macintosh militants finally prevailed over partisans of the Linux operating system. By the end of the contest some 4,000 active teams were processing 7 billion keys per second, a rate equivalent to the work of 26,000 Pentium computers.

Figure 1. Optimal five-mark Golomb rulerClick to Enlarge Image

On completing RC5-56, the Bovine collaboration turned to RC5-64, a cipher with a 64-bit key. The effort needed to break this code will be 256 times greater, which suggests it could be a labor of decades. It's worth pausing to ask whether the brute-force testing of 18,446,744,073,709,551,616 cryptographic keys is really a better use of resources than displaying pretty fish in an aquarium. Beberg and his colleagues are considering other possible projects. Meanwhile, RSA has announced a new challenge. This time the message is encoded with the same DES algorithm broken last spring, but the contest rules are altered to reward speed of decryption. The initial prize of $10,000 drops to zero after 67.5 days. Bovine has taken up the challenge.

Golomb rulers. Imagine a six-inch ruler with marks inscribed not at the usual equal intervals but at 0, 1, 4 and 6 inches. Taking all possible pairs of marks, you can measure six distinct distances: 1, 2, 3, 4, 5 and 6 inches. A ruler on which no two pairs of marks measure the same distance is called a Golomb ruler, after Solomon W. Golomb of the University of Southern California, who described the concept 25 years ago. The 0-1-4-6 example is a perfect Golomb ruler, in that all integer intervals from 1 to the length of the ruler are represented. On rulers with more than four marks, perfection is not possible; the best you can do is an optimal Golomb ruler, which for a given number of marks is the shortest ruler on which no intervals are duplicated.

The world is not panting in desperate need of better Golomb rulers, and yet these combinatorial curiosities do have practical uses. For example, in setting up an interferometer for radio astronomy, placing the antennas on the marks of a Golomb ruler maximizes the recovery of information about the phases of the signals received.

Many good Golomb rulers have been found by trial-and-error, but proving them optimal (or finding a better ruler if one exists) is computationally expensive. In 1995 a dozen workstations took four CPU-years to prove there is no 19-mark ruler shorter than 246 units. The computation was done by Apostolos Dollas, William T. Rankin and David McCracken of Duke University.

In 1996 David Vanderschel and Mark Garry, who had both worked on Golomb rulers independently, merged their ideas in a program called GVANT, which turned out to be significantly more efficient than earlier programs. They quickly confirmed the known results for rulers of up to 19 marks, but even with their highly optimized algorithm a search for the best 20-mark ruler was a formidable undertaking. They therefore sought Internet collaborators. With seven helpers it took about half a year to prove that a 20-mark ruler of length 283 is optimal.

In the spring of 1997 Vanderschel and Garry turned to the 21-mark ruler, for which the shortest known arrangement is 333 units long. For this ruler a naive algorithm would have to check more than 1030 arrangements of the marks; GVANT prunes the number of cases to about 1015. Roughly 100 volunteers have pitched in to help, but after a year's searching there is still plenty of work left for latecomers. Out of 1,200 trillion arrangements to be checked, fewer than 200 trillion have been examined so far.

Aliens. If the prospect of finding a bigger Mersenne prime or a smaller Golomb ruler won't induce you to pledge your CPU, perhaps you would like to discover an extragalactic civilization. A proposal called SETI@home would put thousands of computers to work sifting for signs of life in signals recorded with the radio telescope of the Arecibo Observatory in Puerto Rico.

A search for extraterrestrial intelligence has been going on at Arecibo for almost two decades. The telescope slowly scans the sky, detecting emissions over a broad range of radio wavelengths. A special-purpose signal-processing computer applies a Fourier transform to the data to pick out narrow-bandwidth signals, which could be the alien equivalent of "I Love Lucy." The astronomers would like to put the data through a more thorough analysis, but computing capacity is a bottleneck. That's where SETI@home comes in.

With enough computers on the job, the Arecibo data could be sliced into finer divisions of bandwidth and frequency. Moreover, the analysis software could check for other kinds of regularities, such as signals that pulsate or that "chirp" through a sequence of frequencies. The task is well suited to Internet computing in that only small blocks of data need to be passed back and forth over the network, but a great deal of computing is needed on each block.

SETI@home is the project of Dan Werthimer of the University of California at Berkeley and several colleagues. Their plans are ambitious; they seek a mass audience. The client software they envision would run as a screen saver, starting up automatically whenever a machine is left idle. As the program churned away on the data analysis, it would also display a series of images related to the project, such as a representation of the data currently being examined or a map showing the progress of the sky survey.

Some 70,000 people have signed up for SETI@home. Unfortunately, the project is on hold for lack of funding.





» Post Comment

 

EMAIL TO A FRIEND :

Subscribe to American Scientist