Seeing between the Pixels
Squeezing the Pixels
Both the strengths and the weaknesses of pixel-based graphic formats lie in the absence of hierarchical structure in the image. On the one hand, having only a flat array of pixels to worry about facilitates many useful kinds of image processing, such as filtering to sharpen edges, or adjusting contrast and color balance. On the other hand, pixels are maddeningly oblivious to what they are portraying. You might look at an image and see parasols, rowboats, picnic baskets and top hats, but none of those objects exist in the computer file. There are only pixels there, ignorant of the patterns they form.
Extracting meaningful structures from image data is a notoriously difficult problem; it's called vision. But there are simpler ways of imposing structure on an array of pixels. These methods cannot pick out top hats or parasols, but they do identify certain patterns and regularities. Much of the work on such pixel patterns derives from efforts to compress images into a smaller space.
Image compression is possible only because scenes of the natural world make up a small and rather special subset of all possible arrays of pixels; they are not random data but have inherent redundancies and correlations. The simplest compression scheme is run-length encoding. If you list pixel values in raster-scan order, you often discover several pixels in a row with the same color. You can then save space by writing down the color only once, along with the number of times it should be repeated. An image compressed in this way is really no longer an array of pixels; it is a collection of lines of various lengths, which are fitted together like planks in a hardwood floor to reconstruct the complete image.
Run-length encoding exploits correlations only along a single dimension. Another way of breaking down and reassembling a picture, called a quadtree, takes advantage of both vertical and horizontal regularities. To build a quadtree, take a square image and average all the pixel values, so that the entire area is a flat expanse of color expressed by a single number; this is the first and crudest level of the quadtree. Now divide the original image into four square quadrants, and average the pixels within each of these regions. The four averages become the four branches at the next level of the quadtree. Continue in the same way, dividing each quadrant into subquadrants, until eventually you reach the level of single pixels. (The construction is easiest when the image is a square whose side is a power of 2, but other shapes and sizes can be accommodated with a little extra effort.)
Quadtrees offer a multiresolution view of pixel data. Looking at just the first few levels of the tree gives a rough impression of the image content; continuing to further levels adds progressively more detail. Some images posted on the Web, called "progressive JPEGs," work in just this way, starting out as blocky approximations and growing sharper through gradual refinement.
A fundamental property of quadtrees is that pixels stored in the same node of the tree are also close together in the image. As a result, sibling nodes tend to be highly correlated, creating an opportunity for image compression. After averaging the color in a quadrant, you can record not the average itself but the difference between the average and the color of the quadrant's parent node. These parent-child differences tend to be small, and so they can be given a very compact encoding.
Surely the most unusual image-compression technique is one based on the idea of fractals, or self-similar patterns. Certain features of the natural landscape are famously fractal—coastlines, mountain ranges and ferns being the favorite examples. An idealized black spleenwort fern has become the mascot of the fractal industry. Given an image of the entire fern, you can make a number of photocopies at reduced size, then rotate and slide the smaller images until they exactly cover the original. By applying the same transformation to any one of these frond images, you create even smaller frondlet images that exactly cover the frond. In nature this process comes to an end after just a few iterations, but mathematically it can be continued indefinitely. It suggests a curious but compact way of representing the fern image—just store a list of instructions for scaling, rotating and translating photocopies. There's no need to keep any form of the original image. Starting with an arbitrary image and repeating the list of transformations enough times, you can generate an image that approximates the fern as closely as you wish.
This is wonderful news for fern-fanciers, but what about images of other subjects? As it turns out, patterns amenable to this kind of treatment are more common than you might guess. Any image can be represented as a "collage" of fractal parts. The mathematical ideas behind this process were first explored by John E. Hutchinson of the Australian National University and were further refined and applied to image compression by Michael F. Barnsley of the Georgia Institute of Technology. The basic idea is that certain geometric transformations—combinations of scaling, rotating and translating—eventually reach a fixed point, where further operations cause no further change. If a section of an image is similar to such a fixed-point pattern, the corresponding list of transformations can serve as an approximate encoding of the section. The trick is dividing the image into the right sections and finding the right transformations.
The fractal description of an image certainly satisfies the desire for a hierarchical structure; indeed, there is no limit to the levels in the hierarchy. You can even zoom in on details—microfronds and nanofronds—that were not present in the original.