Empirical Software Engineering
As researchers investigate how software gets made, a new empire for empirical research opens up
Where We Are
Broadly speaking, people who study programming empirically come at the problem from one of two angles. To some, the phrase software engineering has always had a false ring. In practice, very few programmers analyze software mathematically the way that “real” engineers analyze the strength of bridges or the resonant frequency of an electrical circuit. Instead, programming is a skilled craft, more akin to architecture, which makes the human element an important (some would say the important) focus of study. Hollywood may think that programmers are all solitary 20-something males hacking in their parents’ basement in the wee hours of the morning, but most real programmers work in groups subject to distinctly human patterns of behavior and interaction. Those patterns can and should be examined using the empirical, often qualitative tools developed by the social and psychological sciences.
The other camp typically focuses on the “what” rather than the “who.” Along with programs themselves, programmers produce a wealth of other digital artifacts: bug reports, email messages, design sketches and so on. Employing the same kinds of data-mining techniques that Amazon uses to recommend books and that astronomers use to find clusters of galaxies, software engineering researchers inspect these artifacts for patterns. Does the number of changes made to a program correlate with the number of bugs found in it? Does having more people work on a program make it better (because more people have a chance to spot problems) or worse (because of communication stumbles)? One sign of how quickly these approaches are maturing is the number of data repositories that have sprung up, including the University of Nebraska’s Software Artifact Infrastructure Repository, the archives of NASA’s highly influential Software Engineering Laboratory and the National Science Foundation–funded CeBASE, which organizes project data and lessons learned. All are designed to facilitate data sharing, amplifying the power of individual researchers.
The questions we and our colleagues seek to answer are as wide-ranging as those an anthropologist might ask during first contact with a previously unknown culture. How do people learn to program? Can the future success of a programmer be predicted by personality tests? Does the choice of programming language affect productivity? Can the quality of code be measured? Can data mining predict the location of software bugs? Is it more effective to design code in detail up front or to evolve a design week by week in response to the accretion of earlier code? Convincing data about all of these questions are now in hand, and we are learning how to tackle many others.
Along the way, our field is grappling with the fundamental issues that define any new science. How do we determine the validity of data? When can conclusions from one context—one programming team, or one vast project, like the development of the Windows Vista operating system—be applied elsewhere? And crucially, which techniques are most appropriate for answering different kinds of questions?
Some of the most exciting discoveries are described in a recent book called Making Software: What Really Works, and Why We Believe It, edited by Andy Oram and Greg Wilson (O’Reilly Media, 2011), in which more than 40 researchers present the key results of their work and the work of others. We’ll visit some of that research to give an overview of progress in this field and to demonstrate the ways in which it is unique terrain for empirical investigation.