The Best Bits
A new technology called compressive sensing slims down data at the source
X’s and 0’s
A simple version of compressive sensing works something like the coin-puzzle algorithm, “weighing” various subsets of the N-element signal vector. But instead of relying on carefully designed, pre-specified subsets, we choose the subsets at random. For each subset we generate a vector of N random bits, which take on the values 0 or 1 with equal probability; then an element of the signal vector is included in the subset only if the corresponding random bit is a 1. “Weighing” the subset consists in adding up the values of all the signal elements selected in this way. We then start over with another random bit vector, repeating the process some designated number of times. Call this number m. After m repetitions, we have m subsets and m sums, from which we can try to deduce the values of the N signal elements.
There’s a lot going on behind the scenes in this procedure. In the vocabulary of linear algebra, the “weighing” of a random subset would be described as calculating the inner product of the signal vector and a random bit vector. (An inner product multiplies the corresponding elements and takes the sum.) On a larger scale, the entire series of m weighings amounts to multiplying the signal vector by an m × N matrix of 1’s and 0’s, yielding a vector of m sums as the result. Finally, the process can also be described as a system of linear equations. A typical equation might look like this:
0x1 + 1x2 + 1x3 + ... + 0xN = S.
The variables x1 through xN are the N elements of the signal vector, and the 0 and 1 coefficients are the elements of a random bit vector; S is the sum of this particular subset. A solution to such a system of equations assigns values to all the x’s—which is exactly what we’re seeking.
If we have N equations in N unknowns (and if the equations are all distinct) the system is certain to have a unique solution. But we need an algorithm that works when there are fewer equations—when m is less than N. Under these conditions the problem seems to be underspecified: There are too many variables and not enough constraints. When the subset process maps N numbers of the original signal into m sums, information is irretrievably lost. It’s like adding up a column of digits: Given the digits, you can calculate a unique sum, but the sum doesn’t tell you what digits it came from.
Again we seem to have reached an impasse, but again we have neglected the role of sparseness. The constraints on the solution include not only the m equations but also the knowledge that most of the x’s must be equal to zero.
That extra constraint is crucial. For values of m that remain well below N, it becomes overwhelmingly likely that only one solution is consistent with both the equations and the sparseness constraint. The lower limit on m depends on the details of the sensing process, but a good approximation is k log2 N. For the case of N=1,000,000 and k=1,000, you get the equivalent of a megapixel picture with only about 20,000 measurements.