The Best Bits
A new technology called compressive sensing slims down data at the source
Signals and Samples
A one-dimensional signal, such as a sound, is easy to visualize as a wiggly line, representing amplitude as a function of time. You can digitize this waveform by measuring the amplitude at frequent intervals. How frequent? The answer comes from the Nyquist-Shannon sampling criterion, named for Harry Nyquist and Claude Shannon (who both did their work at Bell Laboratories, though 30 years apart). A rough-and-ready version of the rule says: “Always take samples at more than twice the highest frequency present in the signal.”
In the case of audio signals meant for human ears, with a highest frequency of roughly 20 kilohertz, a common sampling rate is 44.1 kilohertz—safely above the cutoff. Thus one second of digitized audio is encoded in a sequence of 44,100 numbers.
The Nyquist-Shannon rule has an intuitive rationale: Each cycle of an undulating signal has an upward and a downward wiggle, and you need at least one sample in each wiggle. Shannon proved that this is all the information you need: With samples taken at twice the highest frequency, you can reconstruct the original waveform exactly. On the other hand, failing to heed the rule has ugly consequences. When the signal is undersampled, some features are missing from the digitized version; worse yet, spurious components called aliasing artifacts are added.
The Nyquist-Shannon rule might seem to forbid all forms of data compression, either during or after the sensing process, but that’s not the case. The limit applies only when signals are encoded in a particular way—namely as a stream of evenly spaced samples. Other ways of representing a signal can be much more compact.
Consider the simplest audio signal: a pure sine wave at a single frequency. Rather than compile a voluminous (and highly repetitive) sequence of samples, you could just write down the frequency, the duration and the intensity of the sound. (Ordinary musical notation would almost suffice for this purpose.) This transformation—going from the time domain to the frequency domain—yields a description of the sine wave in just a few bits of information.
Of course most sounds are not pure sine waves; even a single note played on a piano has many overtones and transient effects. Thus it’s not immediately obvious that converting to the frequency domain will save space; you might trade a long list of amplitude samples for an equally long list of frequency components. But it turns out that the sounds people are interested in recording tend to be sparse in the frequency domain: They concentrate the bulk of their energy at just a few frequencies. In other words, the graph of energy as a function of frequency has a few strong peaks, with a low background level everywhere else. The MP3 algorithm for compressing audio signals filters snippets of sound into 32 frequency bands and then discards bands that are too faint to be audible.
Images, too, can be encoded and compressed in the frequency domain, although the process is more complex because the data are two-dimensional. Spatial frequencies define variations in brightness or color across the image plane. In a photograph of sunrise over the ocean, broad areas of sea and sky define a low-frequency signal, whereas details of waves or clouds are encoded in higher spatial frequencies. The Nyquist-Shannon limit determines the resolution of the image. (If you want to photograph a picket fence, you need at least two pixels per picket.) The JPEG compression algorithm works by dividing an image into square blocks of 64 pixels, identifying the spatial frequencies within each block, and retaining only the most prominent frequencies.
Frequency analysis is not always the key to data compression; sometimes another domain works better. But every kind of signal that people find meaningful has a sparse representation in some domain. This is really just another way of saying that a meaningful signal must have some structure or regularity; it’s not a mere jumble of random bits. And this observation leads to a general strategy for compression: Find a domain in which the signal is sparse; measure N components in this domain; sort the results from largest to smallest; finally, discard all but the top k components. Typically, N is quite a large number (perhaps a million) and k is much smaller (several thousand).
Compressive sensing also works by choosing a domain in which the signal is sparse and then retaining only k out of N components. But the challenge is to find the k largest values without making all N measurements.
» Post Comment