Logo IMG
HOME > PAST ISSUE > Article Detail


Empirical Software Engineering

As researchers investigate how software gets made, a new empire for empirical research opens up

Greg Wilson, Jorge Aranda

Open Access

2011-11WilsonF6.jpgClick to Enlarge ImageAt present, a tremendous enabler of empirical software engineering research is the open-source software movement, which is rapidly generating a freely available accumulation of code along with complete archives of the communications between developers. In an open-source setting, programmers collect around software projects to produce applications that they want to see available for free. The developers are often in different places and time zones, so communication occurs via email and online forums. The code and communication records are accessible to all via websites, so interested developers can join the project at any stage to share expertise, troubleshoot and add to the source code.

These electronic repositories are a software-engineering researcher’s paradise. They constitute a historical record of the life of a project, including all of the dead ends and debates, the task assignments, the development of team structure and many other artifacts. With thoughtful and targeted searches, researchers can explore topics such as how newcomers adapt to a software project’s culture. They can test prediction engines to assess the validity of theories about project structure and code development. Bug-tracking records and the interpersonal interactions involved in solving software flaws serve as a narrative of the incremental improvement of code quality. Before the open-source community took on its present form, this kind of access to project archives was available only to investigators in corporate research units.

As the open-source movement developed, there was a feeling that researchers should treat it as a special case in the realm of software engineering. Eric Raymond, president of the Open Source Initiative, highlighted the differences between open-source and industrial projects when he compared them to a bazaar and a cathedral. Industry is the cathedral, in which projects are built according to carefully detailed plans, with attendant hierarchy, role divisions, order and dogma. The bazaar is bustling, free-form, organic and shaped by the aggregate actions of the crowd. Researchers bringing results from the open-source world met skepticism about whether their findings could be generalized to the rest of the community. In fact, research has demonstrated that the distinctions between the two worlds are often illusory. There are cathedrals in the open-source sphere and bazaars in the closed-source. Similar social and technical trends can be documented in both, and researchers have come to appreciate the dividends that come from comparing the two.

The work of Guido Schryen at the University of Freiberg and Eliot Rich at the University at Albany, SUNY, is instructive about how to ask and answer questions about the two worlds. In a 2010 paper they addressed a much-debated and critically important issue: Which model leads to better security, open- or closed-source software? Security is a formidable concern for any software that will come within reach of networks. Schryen and Rich examined the security-vulnerability announcements and the release (or nonrelease) of patches (software fixes) for 17 widely deployed software packages. Proponents of open-source software have argued that its characteristically wide developer base must lead to better review and response to security issues. An opposing argument holds that closed-source and industrial projects have more direct motivation to find and fix security flaws. Schryen and Rich sorted the packages they studied within categories such as open- and closed-source, application type (operating system, web server, web browser and so on), and structured or loose organization. They found that security vulnerabilities were equally severe for both open- and closed-source systems, and they further found that patching behavior did not align with an open–versus-closed source divide. In fact, they were able to show that application type is a much better determinant of vulnerability and response to security issues, and that patching behavior is directed by organizational policy without any correlation to the organizational structure that produced the software. Whether open- or closed-source software was more secure turned out to be the wrong question to ask. We do not expect that the lines between the open- and closed-source worlds will be so blurred in every aspect of software engineering, but results like these show how the massive amount of information available as a byproduct of open-source development can be put to scientific use.

As in any applied science, the ultimate measure of success for all of this work will be change—change in the tools used to develop software, change from current practices to those that are provably better and most importantly, change in what is and is not accepted as proof.


  • Bird, C., D. Pattison, R. D’Souza, V. Filkov and P. Devanbu. 2008. Latent social structure in open-source projects. SIGSOFT ‘08/FSE-16: Proceedings of the 16th ACM SIGSOFT Symposium on Foundation of Software Engineering:24–35.
  • Chong, J., and T. Hurlbutt. 2007. The social dynamics of pair programming. Proceedings of the 29th International Conference on Software Engineering:354–363.
  • Freudenberg, S., P. Romero and B. du Boulay. 2007. Talking the talk: Is intermediate-level conversation the key to the pair programming success story? Proceedings of AGILE 2007:84–91.
  • Glass, Robert. L. 2002. Facts and Fallacies of Software Engineering. Boston: Addison-Wesley.
  • Halstead, M. 1977. Elements of Software Science. North Holland: Elsevier Science Ltd.
  • Hannay, J. E., E. Arisholm, H. Engvik and D. I. K. Sjøberg. 2010. Personality and pair programming. IEEE Transactions on Software Engineering 36:61–80.
  • Höfer, A. 2008. Video analysis of pair programming. Proceedings of the 2008 International Workshop on Scrutinizing Agile Practices:37–41.
  • Nagappan, N., B. Murphy and V. Basili. 2008. The influence of organizational structure on software quality: an empirical case study. Proceedings of the 30th International Conference on Software Engineering:521–530.
  • Nosek, J. T. 1998. The case for collaborative programming. Communications of the ACM 41:105–108.
  • Oram, A., and G. Wilson. 2011. Making Software: What Really Works and Why We Believe It. Sebastopol, CA: O’Reilly Media.
  • Schryen, G., and E. Rich. 2010. Increasing software security through open source or closed source development? Empirics suggest that we have asked the wrong question. Proceedings of the 43rd Hawaii International Conference on System Sciences:1–10.

comments powered by Disqus


Subscribe to American Scientist