MY AMERICAN SCIENTIST
LOG IN! REGISTER!
SEARCH
 
RSS
Logo

FEATURE ARTICLE

Knowledge Discovery and Data Mining

Computers taught to discern patterns, detect anomalies and apply decision algorithms can help secure computer systems and find volcanoes on Venus

Carla Brodley, Terran Lane, Timothy Stough

Searching for Volcanoes

As tectonic plates are to earth and craters are to the moon, so volcanoes are to Venus—the geological feature that, more than any other, influences the planet's history. On earth, tectonic-plate movement provides a fairly smooth way for heat to escape from within the planet, and huge amounts of energy are released into the sea at the plate boundaries. On Venus, there is no plate structure and no ocean to act as a heat sink. Thus the surface is peppered with volcanoes. By studying them, planetary scientists hope to answer such questions as how old the surface is (relatively new, judging from the paucity of craters) and how much heat is escaping from the planet's center.

Figure 6. Magellan radar imagesClick to Enlarge Image

Between 1990 and 1994, NASA's Magellan probe mapped 98 percent of Venus's surface, returning 30,000 synthetic-aperture-radar images. Each of these is a black-and-white, 1024 x 1024 grid of pixels. Volcanoes take on many different appearances: Some are obviously conical (like Mount Rainier), whereas others look more like arcs of white cutting through a cone (like Mount St. Helens after the eruption blew out its side).

Spotting large volcanoes is easy. But cataloguing the small ones, which may be less than a mile—or about 20 pixels—in diameter, is a little bit like counting pieces of hay in a haystack. To automate the process, scientists at NASA's Jet Propulsion Laboratory (JPL) in Pasadena, California, turned to a multi-tiered data-mining algorithm. Each tier processes the volcano candidates from the previous tier with increasing computational power. In this way, the algorithm does not waste time processing regions with no volcanoes, and spends more time on the regions that are more likely to contain a volcano.

Although this method proved very successful at finding volcanoes, JPL's baseline system had one drawback: It used a single volcano template, constructed by averaging the training data. In effect, it assumed that all volcanoes look roughly the same. We were able to improve the true detection rate in the first tier of JPL's algorithm by applying two techniques known as spoiling and clustering.

Figure 7. To improve the ability of JPL's algorithmsClick to Enlarge Image

In JPL's training data, the Magellan images are cut into image patches 15 pixels wide by 15 pixels high. Planetary scientists have examined each of these patches to produce a consensus on which features represent volcanoes. The original JPL algorithm produced a composite image by averaging the brightness of each pixel in all the volcanoes (after centering each volcano in its frame, and adjusting for the brightness of the background). Our clustering algorithm separated the volcanoes into six groups—a number determined by the automated procedure—and then traded volcanoes among groups in such a way as to maximize the similarity among members of a given group and minimize the similarity among different groups.

The clustering algorithm would have been computationally infeasible, however, if we had treated every pixel in the 15 x 15 image patch as a significant feature. In the most extreme case, none of the volcanoes in our training set would have been similar enough to cluster together. This problem could have been solved by making the training set sufficiently large to allow meaningful groups to form. But training data are a limited resource, requiring a great deal of human time and expertise to produce. Therefore it made more sense to apply a data-reduction method that recognizes that not every pixel, even in a 15 x 15 picture, is meaningful.

Our approach radically scaled down or "spoiled" the image before clustering. Specifically, we averaged groups of pixels together into a 2 x 3-pixel image, and then used those pixel values as the features for clustering. We chose to reduce the image more in the vertical direction than the horizontal because the volcanos were illuminated from the horizontal direction, so that direction contained more intensity information.

This procedure is called spoiling because it is irreversible—a 15 x 15 image cannot be reconstructed from a 2 x 3 one. But after we grouped the 2 x 3 images into clusters, it was easy enough to retrieve all the original 15 x 15 image patches for each cluster. We averaged the images in each cluster together to form prototype volcanoes (called "matched filters"). These prototypes, not the reduced 2 x 3 versions, were then used to detect volcanoes in the tens of thousands of Magellan images. During the detection phase, each matched filter was compared to each 15 x 15 patch in the image being searched. When the correlation between an image patch and a matched filter exceeded a certain threshold (determined empirically from the training images), the patch was identified as a candidate volcano and passed along to the second tier of the JPL algorithm.

The goal of JPL's first tier was to detect every volcano that is actually present on the planet, but with as few bogus detections as possible. To evaluate our approach, we applied a testing method called cross-validation, which is often used when the amount of ground truth available for testing is small. With 36 images labeled by experts (each containing approximately 16 volcanoes), we trained each algorithm on 30 of the images and then tested it on the other six; we repeated the procedure six times, dividing up the same 36 images into a different set of 30 training and 6 test images. In the cross-validation tests, the multiple matched filters achieved a 100 percent volcano detection rate with fewer false positives than a single filter, and in some cases achieved a 100 percent detection rate when the single filter did not.

This application of KDD to scientific image analysis illustrates two of the key challenges in applying data-mining techniques. The first is to ensure that a method is computationally feasible. The second is to define the relevant features that the data-mining algorithm will consider.





» Post Comment

 

EMAIL TO A FRIEND :

Of Possible Interest

Computing Science: A Box of Universe

Feature Article: Empirical Software Engineering

Computing Science: An Adventure in the Nth Dimension

Subscribe to American Scientist

Sites of Interest

Duxbury Ventures Website Investments

Social Justice

Find Websites Worth

München Fair Hotels

ABC Fundraising

Promotional Products

Business Cards

Car Hire

Get a Gold Ira at Regal Assets.

Online Shopping