How Do Scientists Really Use Computers?
A Web-based survey offers clues
Getting the Answers
So what did these people tell us? First, respondents work an average of 48 hours a week, of which 30 percent is spent developing software and 40 percent is spent using it. They also report that these proportions are going up—45 percent of respondents say that scientists spend more or much more of their time developing scientific software than they did 5 years ago, and 70 percent say that they spend more or much more time using it. These answers are much higher than we expected, and probably signal that our (self-selected) respondents use computers more than the “average” scientist (if in fact there is such a thing).
Second, most scientists generate and archive a few gigabytes of data each year. This answer was more popular than all the others together, which were “a few megabytes,” “a few terabytes” and “more than a few terabytes.” One thing we didn’t ask (but should have) was how that data is archived: Is it stored in a Web-accessible database with searchable metadata, or on a DVD stuck in the bottom drawer of someone’s desk? Personal experience tells us the latter is far more likely….
Third, most of the software that scientists work with is widely used: Only 10 percent reported that the programs they rely on are used by three or fewer people. When we asked where that software comes from, though, they reported “commercial off-the-shelf software,” “open source” and “we build it ourselves” in almost equal numbers.
It’s interesting to compare the latter answers with those given for another question. Fifty-eight percent of scientists reported that they do development on their own; 17 percent work with one other person, and 18 percent work in teams of 3 to 5 people, while only 9 percent work in larger groups. These numbers are the reverse of what would be expected for professional software developers, who usually work in teams. They also explain the relatively low uptake among scientists of collaborative tools like version control, which most professional software developers consider essential: If you expect to work alone, why invest in tools for working with others?
The prevalence of solo and small-team work is consistent with another finding. Roughly 38 percent of the programs scientists write are between 500 and 5,000 lines long; smaller programs, and programs between 5,000 and 50,000 lines long, each make up about a quarter of the total, while larger programs account for the remaining 12 to 15 percent. To look at it another way, two thirds of the programs used by these scientists are less than 5,000 lines long.
The hardware scientists use is just as interesting. Eighty-one percent primarily use desktop machines; only 13 percent use intermediate-sized machines such as departmental Linux clusters, and a mere 6 percent use supercomputers. This is consistent with their reports about how they use computers: Most said that interactive use was most common, followed by preparing and reformatting data, preparing things for batch processing, and finally systems administration.
As for what occupied the most of our respondents’ time, coding and debugging took first place. Planning and quality assurance tied for second place, reading/reviewing code came third, documenting fourth, and packaging software came last. It is ironic to compare this complaint with answers to another question: What “pain points” hurt you most? Lack of documentation was the number-one answer for more than 40 percent of respondents, and was in the top three for 80 percent.
Where do scientists learn how to develop software and use computers in their research? Almost all said that informal self-study had been most important. Peer mentoring came second, with formal instruction at school or on the job trailing well behind.
To close off, we wanted to find out how good scientists are at developing and using software. However, self-assessment is notoriously unreliable, and administering a proficiency test over the web would have been impractical. We therefore asked our respondents to rate how well they felt they understood various aspects of software development, and how important those aspects are.
The results were consistent with answers given to other questions. In most areas—requirements, design, maintenance, product management and project management—scientists reported that they knew as much as they felt they needed to know. This isn’t surprising: Scientists are usually their own customers, and as our findings about team and program size suggest, those who develop software are creating small programs for their own use. Skills relevant to large projects done for other people are therefore unlikely to loom large in their minds.
The three areas in which respondents felt they didn’t know as much as they should were, in order of increasing gap, software construction, verification and testing. Again, this isn’t surprising, since the whole point of science is to be able to prove that your answers are valid—and that requires confidence in the methods and tools used to get them. The necessity of keeping test tubes clean and calibrating equipment is drilled into students from high school onward, but most are uncomfortably aware that we know a lot less about how to ensure that software is correct. The fact that there always seems to be one more bug to fix only reinforces the feeling.