COMPUTING SCIENCE

# The Best Bits

A new technology called compressive sensing slims down data at the source

When you take a photograph with a digital camera, the sensor behind the lens has just a few milliseconds to gather in a huge array of data. A 10-megapixel camera captures some 30 megabytes—one byte each for the red, green and blue channels in each of the 10 million pixels. Yet the image you download from the camera is often only about 3 megabytes. A compression algorithm based on the JPEG standard squeezes the file down to a tenth of its original size. This saving of storage space is welcome, but it provokes a question: Why go to the trouble of capturing 30 megabytes of data if you’re going to throw away 90 percent of it before anyone even sees the picture? Why not design the sensor to select and retain just the 3 megabytes that are worth keeping?

It’s the same story with audio recording. Music is usually digitized at a rate that works out to roughly 32 megabytes for a three-minute song. But the MP3 file on your iPod is probably only 3 megabytes. Again, 90 percent of the data has been discarded in a compression step. Wouldn’t it make more sense to record only the parts of the signal that will eventually reach the ear?

Until a few years ago, these questions had a simple answer, backed up both by common sense and by theoretical precept. Sifting out the best bits without first recording the whole signal was deemed impossible because you couldn’t know which bits to keep until you’d seen them all. That conclusion now seems unduly pessimistic. A suite of new signal-processing methods known as compressed or compressive sensing can extract the most essential elements “on the fly,” without even bothering to store the rest. It’s like a magical diet: You get to eat the whole meal, but you only digest the nutritious parts.

# Signals and Samples

A one-dimensional signal, such as a sound, is easy to visualize as a wiggly line, representing amplitude as a function of time. You can digitize this waveform by measuring the amplitude at frequent intervals. How frequent? The answer comes from the Nyquist-Shannon sampling criterion, named for Harry Nyquist and Claude Shannon (who both did their work at Bell Laboratories, though 30 years apart). A rough-and-ready version of the rule says: “Always take samples at more than twice the highest frequency present in the signal.”

In the case of audio signals meant for human ears, with a highest frequency of roughly 20 kilohertz, a common sampling rate is 44.1 kilohertz—safely above the cutoff. Thus one second of digitized audio is encoded in a sequence of 44,100 numbers.

The Nyquist-Shannon rule has an intuitive rationale: Each cycle of an undulating signal has an upward and a downward wiggle, and you need at least one sample in each wiggle. Shannon proved that this is *all* the information you need: With samples taken at twice the highest frequency, you can reconstruct the original waveform exactly. On the other hand, failing to heed the rule has ugly consequences. When the signal is undersampled, some features are missing from the digitized version; worse yet, spurious components called aliasing artifacts are added.

The Nyquist-Shannon rule might seem to forbid all forms of data compression, either during or after the sensing process, but that’s not the case. The limit applies only when signals are encoded in a particular way—namely as a stream of evenly spaced samples. Other ways of representing a signal can be much more compact.

Consider the simplest audio signal: a pure sine wave at a single frequency. Rather than compile a voluminous (and highly repetitive) sequence of samples, you could just write down the frequency, the duration and the intensity of the sound. (Ordinary musical notation would almost suffice for this purpose.) This transformation—going from the *time domain* to the *frequency domain*—yields a description of the sine wave in just a few bits of information.

Of course most sounds are not pure sine waves; even a single note played on a piano has many overtones and transient effects. Thus it’s not immediately obvious that converting to the frequency domain will save space; you might trade a long list of amplitude samples for an equally long list of frequency components. But it turns out that the sounds people are interested in recording tend to be *sparse* in the frequency domain: They concentrate the bulk of their energy at just a few frequencies. In other words, the graph of energy as a function of frequency has a few strong peaks, with a low background level everywhere else. The MP3 algorithm for compressing audio signals filters snippets of sound into 32 frequency bands and then discards bands that are too faint to be audible.

Images, too, can be encoded and compressed in the frequency domain, although the process is more complex because the data are two-dimensional. Spatial frequencies define variations in brightness or color across the image plane. In a photograph of sunrise over the ocean, broad areas of sea and sky define a low-frequency signal, whereas details of waves or clouds are encoded in higher spatial frequencies. The Nyquist-Shannon limit determines the resolution of the image. (If you want to photograph a picket fence, you need at least two pixels per picket.) The JPEG compression algorithm works by dividing an image into square blocks of 64 pixels, identifying the spatial frequencies within each block, and retaining only the most prominent frequencies.

Frequency analysis is not always the key to data compression; sometimes another domain works better. But every kind of signal that people find meaningful has a sparse representation in *some* domain. This is really just another way of saying that a meaningful signal must have some structure or regularity; it’s not a mere jumble of random bits. And this observation leads to a general strategy for compression: Find a domain in which the signal is sparse; measure *N* components in this domain; sort the results from largest to smallest; finally, discard all but the top *k* components. Typically, *N* is quite a large number (perhaps a million) and *k* is much smaller (several thousand).

Compressive sensing also works by choosing a domain in which the signal is sparse and then retaining only *k* out of *N* components. But the challenge is to find the *k* largest values without making all *N* measurements.

# Vectors and Coins

In my efforts to understand recent work on compressive sensing, I have found it helpful to set aside all the messy details of sound waves and photons, reducing the process to its mathematical essentials. Imagine that the signal arrives at the sensor already in numerical form; it is a vector, a sequence of *N* numbers. We are assured that this signal vector is sparse: Only *k* of the elements are nonzero. But we know nothing about *where* those *k* elements lie or what their values are. Our task is to determine the positions and values of the nonzero elements with as few measurements as possible.

In this abstract and simplified version of the sensing process, “making a measurement” means selecting some subset of the signal elements and determining their total value. If the subset happens to be a single element, then the measurement yields that element’s value directly. If we choose multiple elements, we can measure all of them at once, but we learn only the sum of their values, with no information about how the sum is apportioned among the individual elements.

The classical sensing protocol measures each element individually. In this way we are sure to learn the whereabouts of all *k* nonzero elements, but the process takes *N* operations. If there is to be any hope of solving the problem with fewer measurements, we’ll somehow have to get more information out of each measurement.

By ganging together pairs of adjacent elements we can cover the entire *N*-element vector with just *N*/2 measurements. However, all we learn about each pair of values is their sum. We have reduced the resolution without gaining any information. Lumping together more than two elements allows even fewer measurements but makes the ambiguity still worse. The strategy doesn’t look promising.

But wait! We are ignoring a vital fact: The vector is known to be sparse, with most of the entries equal to zero. Perhaps by exploiting this knowledge we can make further progress.

Suppose the vector is the ultimate in sparseness—a long string of zeros with just one nonzero number lost somewhere among them. Then the process of measurement becomes a search for the one distinguished element. For this problem there are well-known strategies much better than checking the elements one by one.

The task brings to mind a genre of mathematical puzzles about counterfeit coins. In one such puzzle you are given a sack of *N* coins and told that one of them is underweight. Using a scale calibrated in units of a genuine coin’s weight, how many weighings does it take to identify the bogus coin? Putting coins on the scale in batches rather than individually can reduce the worst-case number of weighings from *N* to the logarithm of *N*. For a sack of eight coins, here is a recipe for solving the problem in three weighings: First weigh the set {1,2,3,4}, then {1,2,5,6} and finally {1,3,5,7}. The results of the three trials surely identify the light coin. For example, if the first and second sets are both light but the third set is correct, then coin 2 must be the culprit; if *no* set is light, coin 8 is implicated. (Going from eight weighings to three may not seem all that impressive, but with a million coins the same trick would reduce the number of weighings from 1,000,000 to about 20.)

An algorithm based on this kind of reasoning could solve the compressive-sensing problem for the special case of *k*=1. Unfortunately, this success is of no practical importance because signals with a single significant component—such as the pure sine wave mentioned earlier—are vanishingly rare in the real world. But a variation of the method can be made to work for larger *k*.

# X’s and 0’s

A simple version of compressive sensing works something like the coin-puzzle algorithm, “weighing” various subsets of the *N*-element signal vector. But instead of relying on carefully designed, pre-specified subsets, we choose the subsets at random. For each subset we generate a vector of *N* random bits, which take on the values 0 or 1 with equal probability; then an element of the signal vector is included in the subset only if the corresponding random bit is a 1. “Weighing” the subset consists in adding up the values of all the signal elements selected in this way. We then start over with another random bit vector, repeating the process some designated number of times. Call this number *m*. After *m* repetitions, we have *m* subsets and *m* sums, from which we can try to deduce the values of the *N* signal elements.

There’s a lot going on behind the scenes in this procedure. In the vocabulary of linear algebra, the “weighing” of a random subset would be described as calculating the inner product of the signal vector and a random bit vector. (An inner product multiplies the corresponding elements and takes the sum.) On a larger scale, the entire series of *m* weighings amounts to multiplying the signal vector by an *m* × *N* matrix of 1’s and 0’s, yielding a vector of *m* sums as the result. Finally, the process can also be described as a system of linear equations. A typical equation might look like this:

0x_{1}+ 1x_{2}+ 1x_{3}+ ... + 0x=_{N}S.

The variables *x*_{1} through *x _{N}* are the

*N*elements of the signal vector, and the 0 and 1 coefficients are the elements of a random bit vector;

*S*is the sum of this particular subset. A solution to such a system of equations assigns values to all the

*x*’s—which is exactly what we’re seeking.

If we have *N* equations in *N* unknowns (and if the equations are all distinct) the system is certain to have a unique solution. But we need an algorithm that works when there are fewer equations—when *m* is less than *N*. Under these conditions the problem seems to be underspecified: There are too many variables and not enough constraints. When the subset process maps *N* numbers of the original signal into *m* sums, information is irretrievably lost. It’s like adding up a column of digits: Given the digits, you can calculate a unique sum, but the sum doesn’t tell you what digits it came from.

Again we seem to have reached an impasse, but again we have neglected the role of sparseness. The constraints on the solution include not only the *m* equations but also the knowledge that most of the *x*’s must be equal to zero.

That extra constraint is crucial. For values of *m* that remain well below *N*, it becomes overwhelmingly likely that only one solution is consistent with both the equations and the sparseness constraint. The lower limit on *m* depends on the details of the sensing process, but a good approximation is *k* log_{2} *N*. For the case of *N*=1,000,000 and *k*=1,000, you get the equivalent of a megapixel picture with only about 20,000 measurements.

# Origins

The ideas behind compressive sensing came together about five years ago when Emmanuel J. Candès, a mathematician at Caltech, was working on a problem in magnetic resonance imaging. He was surprised to discover that he could reconstruct a test image exactly even though the available data seemed insufficient according to the Nyquist-Shannon criterion.

Working with Justin Romberg (now at Georgia Tech), Candès showed that this result was not a fluke and worked out much of the underlying theory. Later Candès began collaborating with Terence Tao of UCLA (who was about to win the Fields Medal, the major prize in mathematics). In 2006 Candès, Romberg and Tao published a paper that set forth the basic principles of compressive sensing. They showed that the method achieves an efficiency close to the theoretical optimum (so we should not expect even better methods to come along soon).

Even before the Candès-Romberg-Tao paper was formally published, the results began to attract wide attention, and a number of other workers and departments launched related projects. David Donoho of Stanford University (who was Candès’s Ph.D. advisor) has made notable contributions both to theory and to applications. At Rice University Richard Baraniuk leads a large and active group; Romberg continues at Georgia Tech. By now there is a worldwide community, with workshops, conferences, web sites, special issues of journals and all the other apparatus of a rapidly expanding research area. (On the other hand, there is no consensus yet on what to call the field; many variations on compressed/compressive/condensed sensing/sampling remain in circulation.)

A prehistory of compressive sensing has also come to light, particularly in the earth sciences. In the 1970s seismologists learned to construct images of reflective layers within the earth based on data streams that did not seem to satisfy the Nyquist-Shannon criterion. The techniques can now be seen as prefiguring compressive sensing.

# Decompressing

The last big challenge of compressive sensing is decompressing! Given a vector of *m* sums and an *m* × *N* matrix of random bits, how do you recover the original signal vector of *N* elements? It’s one thing to know that a unique solution exists, another to find it.

The approach devised by Candès and his colleagues treats the decoding of the compressed signal as an optimization problem. The aim is to find, among the infinite set of solutions to the *m* equations, the solution that optimizes some measure of sparseness. The obvious thing to optimize is sparseness itself; in other words, look for the solution that minimizes the number of nonzero signal elements. An algorithm for conducting such a search is straightforward; unfortunately, it is also utterly impractical, requiring a blind search among all possible arrangements of the *k* nonzero elements. A camera based on this technology would produce pictures you could view only by solving a certifiably hard computational problem.

A more familiar mode of optimization is the method of least squares, known for more than 200 years. This prescription calls for minimizing the sum of the squares of the vector elements. There are efficient methods of finding a least-squares solution, so practicality is not an issue. However, there’s another show-stopper: In compressive sensing the least-squares solution is seldom the correct one. Thus a camera using this algorithm gives you a picture you can see quickly, but the image is badly garbled.

These two optimization methods seem quite different on the surface, but they have a hidden connection. For each vector element *x*, the least-squares rule calculates *x*^{2} and then sums all the results. The search for a sparsest vector can be framed in similar terms, the only change being that *x*^{2} is replaced by *x*^{0}. The zeroth power of 0 is 0, but for any other value of *x*, *x*^{0} is equal to 1. Thus the sum of the zeroth powers counts the number of nonzero elements in the vector. This is just the result we want, but there is no efficient algorithm for finding it.

At this point, having found that *x*^{2} doesn’t work and neither does *x*^{0}, the Goldilocks alternative is irresistible. How about *x*^{1}? Of course *x*^{1} is simply *x*, and so the optimization strategy derived from this rule amounts to a search for the solution vector whose elements have the smallest sum. There are efficient algorithms for performing this minimization. Equally important, the method produces correct results. Given a sparse input (all but *k* elements exactly zero), the compression-decompression cycle almost always reconstructs the input exactly. For an approximately sparse input (all but *k* elements near zero), the reconstruction is a good approximation.

The version of compressive sensing I have presented here is something of a caricature. In particular, the pretense that a signal arrives at the sensor preformatted as a sparse vector of numbers glosses over a great deal of real-world complexity. In practice some transformation (such as conversion from the time domain to the frequency domain) is usually needed. And the random vectors that govern the sampling of the signal may have elements more complicated than 0’s and 1’s. But the basic scheme remains intact. Here is the compressed version of compressive sensing: Find a sparse domain. Sum random subsets of the signal. Decompress by finding the solution that minimizes the sum of the signal elements.

# Spears and Nets

Can you expect to find a compressive-sensing device in your next digital camera or audio recorder? Probably not, but several research groups are actively exploring the prospects.

Kevin F. Kelly and his colleagues at Rice University have built a series of compressive-sensing cameras. At the heart of these devices is an array of micromirrors that can be independently switched between two positions. By swivelling the mirrors, light from any subset of pixels can be focused on a single photodetector, which thereby sums up the luminosity of the entire subset. The mirror array has an effective resolution of 65,536 pixels; the camera can produce images of equivalent resolution with a few thousand random-sample measurements.

For conventional photography, a mirror-array camera may never compete with the technology that puts 10 million photosensors on a single chip. But the compressive-sensing camera may find a niche elsewhere, such as imaging at wavelengths outside the visible range.

Another area where compressive sensing has promise is magnetic resonance imaging—where the whole story began. The imperative in MRI is not so much compressing data for storage but acquiring it quickly, because the patient must hold still while an image is formed. Ordinary after-the-fact compression is no help in this respect, but compressive sensing offers hope of faster scanning without loss of resolution or contrast.

Other likely areas of application are radio astronomy, where long-baseline interferometers operate at the limit of spatial resolution, and perhaps even genetic screening and analysis of gene activation, where hundreds or thousands of signals are difficult to extract from a noisy background.

Candès and Tao argue that compressive sensing is based on a kind of uncertainty principle, where the spectrum of the signal and that of the measuring instrument have complementary roles. The traditional sensing strategy takes sharply focused samples; it’s like hunting with a spear. This works well when the signal is spread out over a broad domain. But when the signal itself is highly structured and narrowly focused, the better plan is to spread the measurements out over the domain—to hunt with a net rather than a spear.

© Brian Hayes

# Bibliography

- Baraniuk, Richard G., Emmanuel Candès, Robert Nowak and Martin Vetterli (editors). 2008. Special issue on compressive sampling.
*IEEE Signal Processing Magazine*25(2):12–13. - Candès, Emmanuel J. 2006. Compressive sampling. In
*Proceedings of the International Congress of Mathematicians*, Madrid, Spain, pp. 1433–1452. - Candès, Emmanuel J., Justin Romberg and Terence Tao. 2006. Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information.
*IEEE Transactions on Information Theory*52:489–509. - Candès, Emmanuel, and Justin Romberg. 2007. Sparsity and incoherence in compressive sampling.
*Inverse Problems*23:969–985. - Candès, Emmanuel J., and Michael B. Wakin. 2008. An introduction to compressive sampling.
*IEEE Signal Processing Magazine*25(2):21–30. - Donoho, David. 2006. Compressed sensing.
*IEEE Transactions on Information Theory*52:1289–1306. - Lustig, Michael. 2008. Sparse MRI. Doctoral thesis, Stanford University. http://www-mrsrl.stanford.edu/~mlustig/thesis.pdf
- MacKenzie, Dana. 2009. Compressed sensing makes every pixel count. In
*What’s Happening in the Mathematical Sciences*, Vol. 7. Providence, R.I.: American Mathematical Society. - Romberg, Justin. 2008. Imaging via compressive sampling.
*IEEE Signal Processing Magazine*25(2)14–20. - Takhar, Dharmpal, Jason N. Laska, Michael B. Wakin, Marco F. Duarte, Dror Baron, Shriram Sarvotham, Kevin F. Kelly and Richard G. Baraniuk. 2006. A new compressive imaging camera architecture using optical-domain compression. In
*Proceedings of the 2006 SPIE Conference on Computational Imaging*, Vol. 6065.

EMAIL TO A FRIEND :