Delving into Deep Learning
The latest neural networks learn to see and hear, and maybe even dream.
Endowing a computer with human perceptual skills, such as understanding spoken language or recognizing faces, has been on the agenda of computer science since the era of vacuum tubes and punch cards. For decades, progress was slow and successes were few. But now you can have a conversation with your cell phone about tomorrow’s weather or last night’s baseball scores. And Facebook and Google+ recognize faces well enough to suggest that you “tag” your friends in photos.
What accounts for these sudden breakthroughs in pattern recognition by machines? There is no single answer. A variety of algorithmic and statistical ideas have played a part, along with more powerful hardware. But one suite of techniques merits special mention. A scheme called deep learning has achieved impressive performance on several pattern-analysis tasks. Programs for deep learning are also known as deep neural networks, because they are constructed as multiple layers of processing elements analogous to the neurons of the nervous system.
The deep methods have a role in some well-known speech-recognition systems, such as Apple’s Siri and the Google search-by-voice service. They are also leading the way in aspects of computer vision. Apart from perceptual tasks, deep learning is proving adept at data mining: extracting meaningful patterns from large data sets.
How do the deep programs work? Oddly enough, no one can answer this question in full detail. A distinctive feature of neural networks is that the designer or programmer does not directly specify all the particulars of a computation. Instead, the neural net is “trained” by exposure to thousands of examples, and it adjusts its internal parameters to maximize its own success. When the training is complete, we have a machine that can answer questions, but we don’t necessarily know how it computes the answers. I find this situation mildly frustrating. On the other hand, it’s a predicament I am familiar with at the most intimate level. I, too, understand speech and recognize faces—and I can’t explain how I do it.