A Troubled Tradition
It’s time to rebuild trust among authors, editors and peer reviewers
By the mid 1700s, editors at the world’s first scientific journal had a problem on their hands. Since its inaugural issue in 1665, the Philosophical Transactions of the Royal Society of London had published many outstanding scientific papers, including such classics as Isaac Newton’s experiments with prisms. But some authors had begun to submit works of fiction and rambling speculative essays. To maintain standards of quality, the editors of Philosophical Transactions launched a system of peer review to evaluate manuscripts before publication. Two centuries went by, however, before the system really caught on. In the mid 20th century, increased specialization, government support for research, and competition for journal space compelled editors to seek assistance from experts. Today, peer review is an essential part of scientific publication and is also used to evaluate manuscripts, grants and academic careers.
In publication, peer review serves two distinct functions: It ensures that work is published only if it meets appropriate standards of scholarship and methodology, and it helps authors improve their manuscripts. The process, familiar to many American Scientist readers, begins when authors submit a manuscript to a journal, often with a list of suggested reviewers and a list of scientists who should not see the work. The journal editor sends papers of interest to members of the editorial board or outside experts who review the work for free. These referees assess the manuscript for originality, importance, validity, and clarity. They also advise the editor about the manuscript’s overall merit and provide written comments—usually anonymously—for the authors. Finally, the editor decides to publish, reject or request revisions to the manuscript.
Although it is hard to imagine how science could progress without independent evaluation of research, peer review is an imperfect system, fraught with questions of bias, efficacy and ethics. At each step of the process, there are opportunities and temptations for reviewers to go astray, and these can take many forms, from simple negligence to intentional abuse for personal gain. If scientific publications are to remain a reliable record of knowledge and progress, editors and reviewers must actively cultivate high ethical standards.
The Importance of Trust
It seems that most scientists have a story or two about suspected unethical behavior among reviewers. As a beginning assistant professor, I submitted a paper on scientific methodology to a prestigious philosophy journal. The review took a long time—over a year—and when I finally got a decision, the paper was rejected with little comment. A couple of months after that rejection, a paper very similar to mine appeared in the same journal. The article did not plagiarize my work word-for-word, but it defended a similar thesis using many of the same arguments. I suspected that the author had served as a referee for my paper and had delayed his review to prevent my article from being published—or perhaps that he had pilfered my ideas. It is possible that the author of this competing paper had independently arrived at conclusions and arguments similar to mine, and that he had submitted his work to the journal before I did. But I had no way of knowing whether this was so. In the end, I was left with a bitter taste in my mouth and I lost some trust in the integrity of peer review.
It can be hard to determine when a reviewer has abused his or her position. Unscrupulous referees may plagiarize a submitted manuscript, breach confidentiality, delay the review process in order to stifle competitors, use data or methods disclosed in a manuscript without permission, make personal attacks on the authors or require unnecessary references to their own publications.
Incidents such as these violate the foundation of trust that is essential to successful evaluation of scientific manuscripts. Authors, editors and reviewers must rely on one another to fulfill their roles with honesty, transparency, confidentiality and professionalism. Absent such trust, the system simply doesn’t work: Authors and editors may ignore reviews that they think are biased or incompetent. Or, fearing that their ideas could be stolen, authors may withhold information necessary to repeat experiments—thereby compromising a key function of scientific publication. Editors who do not trust reviewers to work carefully and disclose conflicts of interest may ignore their comments or delay publication by seeking other reviewers. Disillusioned reviewers may submit careless evaluations or refuse to review manuscripts. Finally, authors who violate reviewers’ and editors’ trust by submitting fraudulent results can create lasting discipline-wide difficulties for other researchers.
To promote trust among authors, editors and reviewers, it is essential that all parties follow ethical standards. Most policies and scholarship related to scientific publication focus on the ethical duties of authors, but at least two sets of important guidelines do address reviewers. The International Committee of Medical Journal Editors recommends that peer review be unbiased and that journals publish their peer-review policies. The Committee on Publication Ethics (COPE), a nonprofit organization of journals, publishers and individuals, has developed guidelines that address confidentiality of peer review, protection of intellectual property, fairness and conflict-of-interest management.
Some standards of peer review for editors and referees, recognized by COPE and leading authorities on research integrity, are:
Confidentiality: Maintain confidentiality throughout the review process.
Respect for intellectual property: Do not use authors’ ideas, data, methods, figures or results without permission.
Fairness: Avoid biases related to gender, nationality, institutional affiliation and career status.
Professionalism: Read manuscripts carefully, give constructive criticism, avoid personal attacks and complete reviews on time. Review only manuscripts that you are qualified to review.
Conflict-of-interest management: Disclose personal, professional or financial interests that could affect a review and avoid reviewing an article if a conflict of interest could compromise judgment.
If referees followed these guidelines faithfully, I suspect there would be very few setbacks in peer review.
Despite efforts to encourage professionalism among reviewers, troubles persist. One of the best-documented issues is inefficacy: Reviewers may miss errors, methodological flaws or evidence of misconduct. To measure reviewers’ ability to catch errors, Magne Nylenna of the Journal of the Norwegian Medical Association and his colleagues created two inauthentic manuscripts with obvious methodological flaws such as inappropriate statistical tests. They sent the papers for review and graded reviewers on the number of flaws they caught. The average score was only 1.7 out of 4, and more than one-third of the reviewers provided no comments on methodology at all. Nylenna’s results were not anomalous. In a similar study led by Fiona Godlee of the British Medical Journal, reviewers discovered, on average, only 2 out of 8 errors introduced into manuscripts. These studies did not determine why reviewers missed so much, but they may have simply read the manuscripts carelessly.
Falsification and fabrication are problems that reviewers shouldn’t have to worry about—but in reality, they must remain alert. One of the most famous examples occurred in 2004 and 2005, when South Korean researcher Woo-Suk Hwang and colleagues published two papers in Science claiming to have developed human embryonic stem-cell lines that were genetically identical to patients’ cells. The work would have been a breakthrough in regenerative medicine, but in June 2005, a whistleblower declared that some of the data were fake. Eventually, a university investigation found that Hwang had fabricated data in both papers, which were then retracted. Whether editors and reviewers should have spotted evidence of Hwang’s misconduct is unclear—but without access to raw data, detecting fraud is notoriously difficult. Indeed, the incident prompted the editors of Science to scrutinize high-profile papers more closely, requiring authors to provide original data and examining digital images more carefully.
Even when reviewers do catch flaws in a manuscript, they may not all agree in their assessments. In one recent study, Richard Kravitz of the University of California, Davis, and colleagues examined reviewer recommendations for more than 2,000 manuscripts submitted to the Journal of General Internal Medicine. For editors to publish work with confidence, reviewers should ideally agree about whether to accept or reject a manuscript—but in fact, they concurred only slightly more often than if they had made their decisions by coin toss. Multidisciplinary research can cause particular confusion because reviewers from different disciplines may accept different methodological standards. In these cases, editors may feel the need to seek additional reviews, potentially delaying publication.
A more subtle but equally pervasive problem is reviewer bias. A reviewer’s evaluation can be influenced by an author’s institutional affiliation, nationality, gender or career status, or by the reviewer’s own financial or professional interests. For example, a referee might be more likely to give a favorable review to a friend than to a competitor, or to favor a well-known researcher from a prestigious institution over a less familiar researcher.
Although specific allegations of bias are difficult to prove, the phenomenon has been documented in systematic studies. In a 1982 study, for example, Douglas Peters and Stephen Ceci selected 12 previously published psychology papers by authors from prestigious institutions, then resubmitted the papers to the same journals using fake author names and institutions. Of nine journals that sent the papers for review, eight rejected them due to poor quality. The results suggest that the original reviewers’ favorable evaluations may have been influenced by the prestige of the authors or their institutions. Although the sample size was small and the experiment lacked controls, larger trials with alternative forms of peer review also suggest that referees are influenced by their assumptions about authors.
More malicious transgressions, such as intentionally delaying reviews, are less well documented. To fill this gap in the literature, my colleagues, Shyamal Peddada and Christine Gutierrez-Ford, and I conducted a survey in 2006 to ask scientists about a range of problems in peer review. The respondents included 220 postdoctoral researchers, staff scientists, principal investigators and technicians working in 22 different biomedical disciplines at the National Institutes of Environmental Health Sciences (NIEHS). They were about 54 percent male and 44 percent female; 2 percent did not specify gender. On average, they were 42 years old and had about 35 publications each.
Reviewer incompetence was the most common problem this group reported: More than 60 percent of respondents said they had encountered at least one reviewer who did not read an article carefully, was not familiar with the subject matter or made mistakes of fact or reasoning in the review (see the chart above). About half said a reviewer was biased. Other common problems were that reviewers required unnecessary references to their own publications or made personal attacks in reviews.
About 10 percent of respondents said a referee delayed the review so that he or she could publish an article on the same topic. Rarer, but still troubling, were reports that reviewers had breached confidentiality or used ideas, data or methods without permission.
An author’s age and number of publications were both positively associated with experiencing an incompetent or biased reviewer—perhaps because a researcher who has published more papers has had more opportunities to encounter reviewers whom he or she views as biased or incompetent. Scientists who are well established in a field may also be less open to criticism from reviewers and therefore be more likely to perceive reviews as inadequate.
This study did have some limitations. The questionnaire asked for respondents’ experiences, but we did not attempt to confirm whether alleged problems actually occurred. For example, some reports of reviewer bias might simply reflect the authors’ dissatisfaction with a referee’s comments. We also did not attempt to determine how frequently respondents experienced the problems they reported. Finally, our sample of biomedical researchers working at a government institution may or may not reflect the experiences of other researchers at other institutions or in other disciplines.
The study does, however, provide some of the only empirical evidence that scientists regularly experience a range of ethical problems with peer review. To expand upon our results, future research should examine the prevalence and significance of such problems in peer review, as well as potential causes—such as inadequate training, or competition for status or funding. This research might take the form of focus groups, interviews, and surveys with editors, reviewers and authors. The results could guide both policy development and educational initiatives.
Alternative Forms of Peer Review
Some journals and conferences have adopted or tested alternative forms of peer review. One common alternative, double-blind review, could serve to reduce reviewer bias because neither authors nor reviewers know each other’s identities or affiliations. Another alternative is unblinded (or open) review, in which both authors and reviewers do know each other’s identities—a situation that might encourage ethical behavior among reviewers who cannot hide behind a cloak of anonymity.
Studies of these two forms of review have, however, yielded mixed results. Logistics are an important hurdle: In trials of double-blind review, several medical journals found that about one-quarter to one-half of reviewers were able to correctly guess authors’ identities despite blinding. And in trials of open review, referees who were asked to reveal their names to authors often refused to participate.
There is nevertheless evidence that blinding does reduce bias. Joseph Ross of the Robert Wood Johnson Clinical Scholars Program led a five-year study, published in 2006, which showed that authors’ nationality and institutional affiliation affected acceptance of abstracts for the American Heart Association’s annual Scientific Sessions. Among thousands of abstracts submitted per year, blinded reviewers accepted about 12 percent fewer abstracts from prestigious institutions than did reviewers who were aware of authors’ affiliations. Blinded reviewers also accepted fewer abstracts from within the United States and more from outside the United States than did unblinded reviewers. Blinding must have reduced bias resulting from reviewers’ assumptions about authors’ countries and institutions.
Whether blinding also improves the quality of reviews is unclear. In 1990, Robert McNutt and colleagues found that blinded reviewers provided more accurate, well-supported and courteous reviews than did unblinded reviewers of articles submitted to the Journal of General Internal Medicine. But several years later, Amy Justice led a similar study with five different medical journals, and found no effect of blinding on review quality.
Results for open review have been similarly mixed. One study at the British Journal of Psychiatry, led by Elizabeth Walsh, found that when referees revealed their identities to authors, they provided better reviews, were more courteous, took longer to complete reviews, and were more likely to recommend publication than were anonymous reviewers. But a pair of studies led by Susan Van Rooyen of the British Medical Journal found that revealing reviewers’ identities—either to authors or to co-reviewers—did not impact review quality or reviewers’ recommendations.
The discrepancies among these studies of double-blind and open review could arise from their differing methodologies and sample populations. It is also worth mentioning that none of these studies examined the most serious ethical issues, such as respect for intellectual property. Future studies should take these factors into account, and they may eventually tip the balance in favor of one form of review or another.
Peer review is a key feature of scientific publication, but it is susceptible to bias, inefficacy and other ethical transgressions. Alternative forms of review have produced only equivocal improvements in fairness and efficacy and have not been tested with respect to other problems. What are the next steps we should take to improve peer review?
First, researchers should receive more education on how to review scientific articles—a skill that is not typically emphasized during research training. Some scientists do show students and postdocs how to review papers, and some research institutions cover peer review in seminars and workshops on research ethics. These practices must become more widespread. In particular, investigators should teach their trainees how to evaluate articles for scientific merit and to follow ethical standards of peer review. Asking young scientists to help review papers is a good way to educate them, provided journal editors give their permission and the process remains confidential.
Journals should also develop and publicize instructions for new reviewers and policies for reviewers and editors, just as they have done for authors. Rules should address confidentiality, fairness, conflict of interest, respect for intellectual property, and professionalism.
Editors should carefully manage the peer review process to prevent or address problems and concerns. They should explicitly inform reviewers about their journals’ peer-review policies, remind reviewers to disclose conflicts of interest and return their reports on time, and delete any personal attacks from reviewers’ comments. If editors have evidence that a reviewer provided a poor review or abused the process, they should not invite that person to do other reviews. These ideals may be complicated when editors have difficulty finding experts willing to review a manuscript and when referees submit their reviews late, overlook errors, or disagree about the quality of a submission. Workshops and conferences on the subject could help editors to cope with these challenges.
Finally, scholars should conduct additional research on the ethics of peer review. Our exploratory study of the experiences of NIEHS researchers suggests that some problems are common, but the results should be confirmed in other settings. Future work should determine how often ethical problems occur and how they affect scientists’ attitudes and behaviors. Studies should also address the causes of unethical behavior in journal peer review and the effectiveness of alternatives, such as double-blind or open review, at preventing various types of transgressions.
There is certainly no perfect solution to the problem of quality control in the scientific record. Despite its flaws, the system adopted by the editors of Philosophical Transactions two and a half centuries ago seems to work as well as any method that has been tried—but its age and pervasiveness must not foster complacency. While journals, editors, and scholars work to understand and regulate peer review, it’s up to every individual scientist to maintain a thoughtful awareness of his or her participation in the process. Such vigilance and professionalism can only improve the quality of reviews, and might even spark new insights into how the review system could eventually be improved.
I am grateful to Bruce Androphy, Zubin Master, Christine Flowers and Adil Shamoo for helpful comments. This essay does not represent the views of the NIH, the NIEHS or the U.S. government.
- Benos, D. J., K. L. Kirk and J. E. Hall. 2003. How to review a paper. Advances in Physics Education 27:47–52.
- Committee on Publication Ethics. 2010. Code of Conduct. http://publicationethics.org/files/u2/New_Code.pdf Accessed September 4, 2010.
- Dalton, R. 2001. Peers under pressure. Nature, 413:102–104.
- Fisher, M., S. B. Friedman and B. Strauss. 1994. The effects of blinding on acceptance of research papers by peer review. Journal of the American Medical Association 272:143–46.
- Garfunkle, J. M., et al. 1994. Effect of institutional prestige on reviewers’ recommendations and editorial decisions. Journal of the American Medical Association 272:137–38.
- Godlee, F., C. Gale and C. Martyn. Effect on the quality of peer review of blinding reviewers and asking them to sign their reports: A randomized controlled trial. Journal of the American Medical Association 280:237–40.
- International Committee of Medical Journal Editors. 2010. Uniform Requirements for Manuscripts Submitted to Biomedical Journals. http://www.icmje.org/urm_main.html Accessed September 6, 2010.
- Justice, A. C., et al. 1998. Does masking author identity improve peer review quality? A randomised controlled trial. Journal of the American Medical Association 280:240–242.
- Kravitz, R., et al. 2010. Editorial peer reviewers’ recommendations at a general medical journal: Are they reliable and do editors care? PLoS One 5:e10072.
- McNutt, R. A., et al. 1990. The effects of blinding on the quality of peer review: A randomized trial. Journal of the American Medical Association 263:1371–76.
- Mulligan, A. 2005. Is peer review in crisis? Oral Oncology 41:135–141.
- Nature Editors. 2001. Editorial: Bad peer reviewers. Nature 413:93.
- Nature Editors. 2006. Editorial: Peer review and fraud. Nature 444:971–72.
- Nylenna, M., P. Riis and Y. Karlsson. 1994. Multiple blinded reviews of the same two manuscripts: Effects of referee characteristics and publication language. Journal of the American Medical Association 272:149–51.
- Peters, D., and Ceci, S. 1982. Peer-review practices of psychological journals: the fate of submitted articles, submitted again. Behavioral and Brain Science 5:187–255.
- Rennie, D. 2003. Misconduct and journal peer review. In Godlee, F., and Jefferson, T., eds. Peer Review in Health Sciences. 2nd edition. London: BMJ Books, 2003:118–29.
- Resnik D. B., C. Gutierrez-Ford and S. Peddada. 2008. Perceptions of ethical problems with scientific journal peer review: An exploratory study. Science and Engineering Ethics 14:305–10.
- Ross, J. S. et al. 2006. Effect of blinded peer review on abstract acceptance. Journal of the American Medical Association 295:1675–80.
- Rowe, B. H., et al. 2006. Reviewer agreement trends from four years of electronic submissions of conference abstract. BMC Medical Research Methodology 6:14.
- Schroter, S., et al. 2004. Effects of training on quality of peer review: Randomised controlled trial. British Medical Journal 328:673-75.
- Schroter, S., et al. 2006. Differences in review quality and recommendations for publication between reviewers suggested by authors or editors. Journal of the American Medical Association 295:314–17.
- Shamoo, A. S., and D. B. Resnik. 2009. Responsible Conduct of Research. 2nd edition. New York: Oxford University Press.
- Sieber, J. 2006. Quality and value: How can we research peer review? Nature. doi:10.1038/nature05006. http://www.nature.com/nature/peerreview/debate/nature05006.html
- Smith, R. 2006. Peer review: A flawed process at the heart of science and journals. Journal of the Royal Society of Medicine 99:178–82.
- Suls, J., and R. Martin. 2009. The air we breathe: A critical look at practices and alternatives in the peer review process. Perspectives on Psychological Science 4:40–50.
- Van Rooyen, S., et al. 1998. Effect of blinding and unmasking on the quality of peer review. Journal of the American Medical Association 280:234–237.
- Van Rooyen, S., et al. 1999. Effect of open peer review on quality of reviews and on reviewers’ recommendations: A randomised trial. British Medical Journal 317:23–27.
- Walsh, E., et al. 2000. Open peer review: A randomized controlled trial. The British Journal of Psychiatry 176: 47–51.Walsh, E., et al. 2000. Open peer review: A randomized controlled trial. The British Journal of Psychiatry 176: 47–5.