A Troubled Tradition
It’s time to rebuild trust among authors, editors and peer reviewers
Despite efforts to encourage professionalism among reviewers, troubles persist. One of the best-documented issues is inefficacy: Reviewers may miss errors, methodological flaws or evidence of misconduct. To measure reviewers’ ability to catch errors, Magne Nylenna of the Journal of the Norwegian Medical Association and his colleagues created two inauthentic manuscripts with obvious methodological flaws such as inappropriate statistical tests. They sent the papers for review and graded reviewers on the number of flaws they caught. The average score was only 1.7 out of 4, and more than one-third of the reviewers provided no comments on methodology at all. Nylenna’s results were not anomalous. In a similar study led by Fiona Godlee of the British Medical Journal, reviewers discovered, on average, only 2 out of 8 errors introduced into manuscripts. These studies did not determine why reviewers missed so much, but they may have simply read the manuscripts carelessly.
Falsification and fabrication are problems that reviewers shouldn’t have to worry about—but in reality, they must remain alert. One of the most famous examples occurred in 2004 and 2005, when South Korean researcher Woo-Suk Hwang and colleagues published two papers in Science claiming to have developed human embryonic stem-cell lines that were genetically identical to patients’ cells. The work would have been a breakthrough in regenerative medicine, but in June 2005, a whistleblower declared that some of the data were fake. Eventually, a university investigation found that Hwang had fabricated data in both papers, which were then retracted. Whether editors and reviewers should have spotted evidence of Hwang’s misconduct is unclear—but without access to raw data, detecting fraud is notoriously difficult. Indeed, the incident prompted the editors of Science to scrutinize high-profile papers more closely, requiring authors to provide original data and examining digital images more carefully.
Even when reviewers do catch flaws in a manuscript, they may not all agree in their assessments. In one recent study, Richard Kravitz of the University of California, Davis, and colleagues examined reviewer recommendations for more than 2,000 manuscripts submitted to the Journal of General Internal Medicine. For editors to publish work with confidence, reviewers should ideally agree about whether to accept or reject a manuscript—but in fact, they concurred only slightly more often than if they had made their decisions by coin toss. Multidisciplinary research can cause particular confusion because reviewers from different disciplines may accept different methodological standards. In these cases, editors may feel the need to seek additional reviews, potentially delaying publication.
A more subtle but equally pervasive problem is reviewer bias. A reviewer’s evaluation can be influenced by an author’s institutional affiliation, nationality, gender or career status, or by the reviewer’s own financial or professional interests. For example, a referee might be more likely to give a favorable review to a friend than to a competitor, or to favor a well-known researcher from a prestigious institution over a less familiar researcher.
Although specific allegations of bias are difficult to prove, the phenomenon has been documented in systematic studies. In a 1982 study, for example, Douglas Peters and Stephen Ceci selected 12 previously published psychology papers by authors from prestigious institutions, then resubmitted the papers to the same journals using fake author names and institutions. Of nine journals that sent the papers for review, eight rejected them due to poor quality. The results suggest that the original reviewers’ favorable evaluations may have been influenced by the prestige of the authors or their institutions. Although the sample size was small and the experiment lacked controls, larger trials with alternative forms of peer review also suggest that referees are influenced by their assumptions about authors.
More malicious transgressions, such as intentionally delaying reviews, are less well documented. To fill this gap in the literature, my colleagues, Shyamal Peddada and Christine Gutierrez-Ford, and I conducted a survey in 2006 to ask scientists about a range of problems in peer review. The respondents included 220 postdoctoral researchers, staff scientists, principal investigators and technicians working in 22 different biomedical disciplines at the National Institutes of Environmental Health Sciences (NIEHS). They were about 54 percent male and 44 percent female; 2 percent did not specify gender. On average, they were 42 years old and had about 35 publications each.
Reviewer incompetence was the most common problem this group reported: More than 60 percent of respondents said they had encountered at least one reviewer who did not read an article carefully, was not familiar with the subject matter or made mistakes of fact or reasoning in the review (see the chart above). About half said a reviewer was biased. Other common problems were that reviewers required unnecessary references to their own publications or made personal attacks in reviews.
About 10 percent of respondents said a referee delayed the review so that he or she could publish an article on the same topic. Rarer, but still troubling, were reports that reviewers had breached confidentiality or used ideas, data or methods without permission.
An author’s age and number of publications were both positively associated with experiencing an incompetent or biased reviewer—perhaps because a researcher who has published more papers has had more opportunities to encounter reviewers whom he or she views as biased or incompetent. Scientists who are well established in a field may also be less open to criticism from reviewers and therefore be more likely to perceive reviews as inadequate.
This study did have some limitations. The questionnaire asked for respondents’ experiences, but we did not attempt to confirm whether alleged problems actually occurred. For example, some reports of reviewer bias might simply reflect the authors’ dissatisfaction with a referee’s comments. We also did not attempt to determine how frequently respondents experienced the problems they reported. Finally, our sample of biomedical researchers working at a government institution may or may not reflect the experiences of other researchers at other institutions or in other disciplines.
The study does, however, provide some of the only empirical evidence that scientists regularly experience a range of ethical problems with peer review. To expand upon our results, future research should examine the prevalence and significance of such problems in peer review, as well as potential causes—such as inadequate training, or competition for status or funding. This research might take the form of focus groups, interviews, and surveys with editors, reviewers and authors. The results could guide both policy development and educational initiatives.