Deep Learning and Galaxy Classification

Algorithms like those used in facial recognition are teasing out features of galaxies not apparent to human observers.

Astronomy Computer Astrophysics

Current Issue

This Article From Issue

September-October 2018

Volume 106, Number 5
Page 317

DOI: 10.1511/2018.106.5.317

Recognizing an image used to be an overwhelming challenge for a computer. But since 2012, so-called deep learning algorithms have achieved impressive success rates at image recognition tasks, such as distinguishing images of cats and dogs. Such programs are also referred to as neural networks or artificial intelligence because of the methods used to achieve their results. Programmers do not feed the program the different characteristics of, say, cats and dogs, but rather show them lots of examples of each animal until the program can reliably tell the difference. What features the program uses to distinguish the creatures isn’t preprogrammed, and can even remain unclear to the programmer.

Images are a central element in astronomy as well. Telescopes capture photons from sources in outer space, and these photons are transformed into images or spectra, which are then analyzed. One of the major providers of images over the past 25 years has been the Hubble Space Telescope, which has delivered the most distant and best resolved images of galaxies to date. Astronomers wish to decode the information available in these images to unveil the formation history of the observed galaxies, and deep learning techniques recently have been adapted as tools for this purpose.

Ad Right

Astronomers classify galaxies by their shapes, or morphologies. Some are egg-shaped; others such as the Milky Way are almost flat disks. In the early universe, galaxies seem to start out as more “pickle shaped.” These shapes tell us relevant information about the formation history of the galaxies. Since the 1930s, morphologies have been determined by visual inspection, partly because no algorithm performed better than the human eye. But with the rapid increase in astronomical data that’s coming in, this already time-consuming task will become impossible by visual inspection alone. In 2015, we were able to show that deep learning algorithms could achieve unprecedented accuracy in determining the morphologies of distant galaxies observed with the Hubble Space Telescope, reaching an agreement with human-based classifications close to 95 percent. Such usage solves a 100-year-old problem, allowing astronomers to better keep up with the flood of galaxy data.

There is a fascinating future, and uncountable things to learn, by transferring artificial intelligence techniques to astronomy.

But we felt that the ability of deep-learning algorithms to automatically extract features and find correlations among images could be used even more powerfully than just in classifying galaxies. Right now, when visually observing galaxies, astronomers look for features that give clues about the underlying physics. But the data are often full of “noise,” making subtle signs of physical processes difficult to measure. And data and are typically multi-dimensional, meaning that, for instance, galaxies have to be observed in multiple wavebands. So we wanted to find out whether deep learning algorithms are able to capture subtle correlations in complex data and link those to the physics of galaxy evolution.

To test it out we used supercomputers to create simulations of galaxies that included all current knowledge of the physics of galaxy evolution. By using a simulation, we can ensure that we know the entire history of the galaxy being examined, whereas with observations of a real galaxy, we only have a snapshot of one moment in its lifetime. We provided the deep learning algorithms with single views of different stages of evolution of our simulated galaxies, and asked the algorithm if it could identify a given evolutionary stage.

By using a simulation, we can ensure that we know the entire history of the galaxy being examined.

We tested this idea with what we call “the blue nugget phase,” in which galaxies are particularly active in forming stars at their centers. We used 35 simulated galaxies and generated mock-observed images in the same format as those from the Hubble Space Telescope. We labeled every image according to its evolutionary stage from the simulation (before, during, or after the blue-nugget phase).

The deep neural networks were able to retrieve the galaxy phase with nearly 80 percent accuracy, even though it was very difficult for human observers to identify the phase just by looking at the images. This result implies that the neural networks were able to automatically find subtle traces of evolutionary phase in the data, not obvious to astronomers. Thus deep learning techniques are not only a faster way to do what we already knew how to do, but are also a powerful tool to help astronomers analyze the data and find hidden correlations.

Simulations by Daniel Ceverino and Joel Primack; simulated images by Greg Snyder and Marc Huertas-Company; Hubble Space Telescope observation CANDELS

There is a fascinating future, and uncountable things to learn, by transferring artificial intelligence techniques to astronomy. The field advances so fast that it is difficult even to keep track of all new tools with potential applications. But there are also potential dangers, as is usual when a new technology is introduced. Deep-learning networks still behave like black boxes; they should not be used blindly because it’s still an open issue to understand how they make their decisions. Estimating the uncertainties in their decisions remains a problem until their inner workings are better known. Researchers are working to combine probabilistic approaches with deep learning to define these uncertainties, and the results are promising. Outstanding issues aside, these algorithms still open an important number of possible future applications, especially now that the volume, quality, and complexity of data in astronomy are rapidly increasing.