Australasian Science: Australia's authority on science since 1938

Peer Review: Can We Rely On It?

Acupuncture

A paper published in Nature Neuroscience has been widely criticised after discussing the release of the painkilling molecule adenosine when acupuncture needles were applied to mice and rotated behind the knee at what, in humans, acupuncturists call the Zusanli point. However, the paper did not conduct any control studies, such as whether the same levels of adenosine were released when needles were applied away from acupuncture locations. See box below.

By Stephen Luntz

Science depends on peer review. Is sloppy or hurried review a serious problem, and if so what can we do to fix it?

More than a million papers are published in peer-reviewed journals each year, so it is hardly surprising that some mistakes are made. While some journals are notorious for publishing anything that suits the editors’ ideological bent, errors are not limited to journals that are obscure or are published by an organisation with a particular agenda.

For instance, in 2009 the prestigious Journal of Geophysical Research published a paper that purported that recent changes in global temperature have been almost entirely driven by the El Niño Southern Oscillation. The paper concluded that there was minimal long-term trend in global temperature, thus at a single stroke disputing the vast volume of literature on climate change.

However, the paper contained an error so basic it should have been obvious to anyone with university mathematics training, which the journal acknowledged in a subsequent edition. Despite the paper’s retraction, its conclusion is still being promoted worldwide in newspaper articles and online blogs.

In May last year Nature Neuroscience published a paper purporting to produce evidence for the effectiveness of acupuncture, and explain how it works. Given the popularity of acupuncture, this was picked up by media outlets around the world. However, the mechanism revealed, while interesting, would not account for most of the claimed benefits of acupuncture, and indeed does not deal with much of the evidence against acupuncture (see box: Needling the Truth). The paper’s lead author, Nanna Goldman of the University of Rochester Medical Center, was a teenager at the time and her mother was the paper’s last author. This should have raised questions in the minds of editors and reviewers, yet one of the world’s most prestigious journals ran it anyway.

Noted biologist and atheist P.Z. Myers drew attention to a paper in Proteomics that says mitochondria carry a “single common fingerprint initiated by a mighty creator”. If the paper had actually produced evidence for the existence of God we would probably have heard more about it. Instead, it seems that the religious beliefs of the authors, Mohamad Warda and Jin Han of Korea's Inje University, led them to see evidence for intelligent design where none exists. The journal’s Editor-in-Chief, Michael Dunn, admitted: “Clearly human error has caused a misstep in the normally rigorous peer review process that is standard practice for Proteomics and should prevent such issues arising”.

These examples are unusual only because they involve better-known journals; some obscure journals have provided even more bizarre examples where peer review has failed. For example, in 2009 the Open Information Science Journal accepted a manuscript of computer-generated nonsense submitted by the executive director of international business and product development at the New England Journal of Medicine, Mr Kent Anderson, and even requested that the authors pay $800 in “open access fees”. Clues to the hoax were even given in the institutional affiliation of its authors: the Center for Research in Applied Phrenology (CRAP).

Under Review
In June this year Dr Ove Hoegh-Guldberg, Director of the Global Change Institute at the University of Queensland, wrote: “Peer review is the basis of modern scientific endeavour. It underpins research and validates findings, theories and data.” Yet Dr Charles Jennings, Director of the McGovern Institute at the Massachusetts Institute of Technology, commented in Nature: “Scientists understand that peer review per se provides only a minimal assurance of quality, and that the public conception of peer review as a stamp of authentication is far from the truth”.

Peer review lies at the core of modern science, but concerns that the system may be failing led the British House of Commons to publish a report into the workings of peer review in scientific publications. The report, published in July, is evidence of increasing concern that this role may not be fulfilled as well as it might be, and indeed might once have been. It quotes the UK’s Chief Scientific Adviser, Prof Sir John Beddington: “If you posed the question, ‘Is the peer review process fundamentally flawed?’ I would say absolutely not. If you asked, ‘Are there flaws in the peer review process which can be appropriately drawn to the attention of the community?’ the answer is yes.”

The most common criticism of peer review by non-scientists, and particularly those who are in conflict with the scientific mainstream, is that it is too strict and excludes good science. “Peer review enforces state-sanctioned paradigms,” says Dr Donald Miller Jr, who is prominent in denying, against all evidence, that HIV causes AIDS, while economist and futurist Prof Robin Hanson describes peer review as “just another popularity contest”.

While the House of Commons report acknowledges this criticism, for many scientists a bigger concern is that the system has become too lax. Sir Mark Walport of the Wellcome Trust noted that when a badly performed medical study wins publication there can be “harmful consequences for patients”.

A case in point was the publication in 2003 of The Australasian Journal of Bone and Joint Medicine, which had the look and feel of a peer-reviewed journal but in fact was a “sponsored” compilation of reprinted and summarised papers that were favourable to the pharmaceutical company Merck, manufacturer of the anti-inflammatory drug Vioxx, which has since been withdrawn from sale due to evidence that it increased the risk of cardiac arrest and stroke.

While the Vioxx research had indeed passed the peer review process and been published in legitimate journals, publication of The Australasian Journal of Bone and Joint Medicine presented a skewed view of the research that overlooked any confounding evidence of the safety and efficacy of Vioxx.

A Checklist
Inaccurate research may pass peer review for a number of reasons. For example, published conclusions may be incorrect due to an insufficient sample size, an honest mathematical mistake or faulty equipment. Fellow scientists understand this, but such an impression is seldom conveyed in the media. Even less attention is paid to the possibility that researchers may have produced a genuine result but are drawing too wide a conclusion, as appears to be the case in Goldman’s acupuncture study.

When a paper is subsequently retracted, many people assume that fraud must have occurred, greatly undermining their faith in science and scientists more generally. In fact, the consequences for fraud are so serious for a researcher’s career it is probably one of the less common forms of error. Fraud cases have been detected from fewer than one in 10,000 US scientists, although a metastudy of research into serious misconduct by Dr Daniele Fanelli of the University of Edinburgh found estimates as high as 2%. Even so, it is likely that accidental sources of error are far more frequent.

These misconceptions make it very hard to discuss science in highly politicised or controversial areas, yet it is usually these areas where public understanding is most important.

In 2009 Prof Ken Baldwin, who was then President of the Federation of Australian Scientific and Technological Societies, published a report entitled When Is Science Valid? A Short Guide on How Science Works and When to Believe It (www.fasts.org) in which he provided a quick checklist for testing whether an idea is valid. Having passed peer review is the first point on the checklist, although Baldwin emphasises that the journal needs to be in a relevant field of science and adds three additional questions:

• have other scientists cited that publication as being valid?

• have other scientists conducted additional tests confirming that the idea is valid?

• has the idea been built upon to create new understanding?

Baldwin is not aware of any research on whether blatantly erroneous papers are increasing in the scientific literature. “It would be a hard thing to test,” he says. However, he would not be altogether surprised. “Journals are struggling under the weight of papers. It’s hard to find good reviewers, and good reviewers are overused. Of course they have their own research to do so they don’t have much time.

“The typical reviewer would spend of the order of hours rather than days reviewing the paper,” Baldwin adds. “They wouldn’t usually check the calculations.”

This makes sense when one considers the British House of Commons estimate that reviewers effectively donate £1.9 billion globally each year in unpaid time conducting reviews. Few can afford to spend as much time as might be ideal.

Instead Baldwin says: “They look for flags. They’ll check the methodology. If there are problems there they will look deeper and maybe check some of the calculations at that point.” Normally reviewers will also check to see if other relevant research has been cited, particularly if this data contradicts previous papers.

If reviewers do not have time to check every aspect of a paper it is hardly surprising that some bad ones will get through. However, Jennings argues that in most cases this isn’t important. “Given that many papers are never cited (and one suspects seldom read), it probably does not matter much to anyone except the author whether a weak paper is published in an obscure journal.”

Moreover, many errors are quickly picked up by other experts in the field, and the only citations the paper gets are from those pointing out the problems.

Baldwin is similarly sanguine about such lapses. “The bottom line is we want to ensure that what is published in the literature has enough information that people can go away and use it or go away and test it.” If research is obviously flawed this will be picked up after publication.

Journals include erratum sections where mistakes in previous editions are pointed out. “It may be that people are using erratum sections less, and writing directly to the authors instead,” Baldwin says.

However, Baldwin does not think there is a problem with journals being unwilling to admit their mistakes. “A journal that sweeps things under the carpet loses credibility and its circulation drops. Authors may then be reluctant to submit papers. There is no shame in correcting an error. In fact, journals may gain credit for doing so.”

Public Impact
Flawed papers take on far greater significance when they contain a characteristic that makes them of interest to non-scientific media. For example, a peer-reviewed paper claiming to discredit anthropogenic global warming is guaranteed many times the mainstream media coverage of a paper supporting the conclusions of the Intergovernmental Panel on Climate Change. The Journal of Geophysical Research paper is still sometimes referred to even though the errors in it were so blatant that the journal would not publish the author’s defence to its comprehensive demolition. Such coverage can have a major impact on political debate.

Likewise, a paper providing support for alternative medicine may be dismissed or ignored by scientists in the field who can spot its flaws, but could prove influential in the sort of medical treatment that the public seek.

Of course, plenty of problems arise even when peer review works perfectly. Controversial research that has not been peer reviewed will often be treated by supporters as if it has. Cautious statements in scientific journals can be distorted in media reports so that they appear to make much larger claims than a study of the source material reveals.

Nevertheless, scientists such as Hoegh-Guldberg continue to use the superiority of peer review as a way to combat falsehoods. But the more successful such a campaign becomes, the more problematic is a paper that survives peer review when it should not have.

Some editors, aware of the abuse that papers can be put to by those with an ideological bent, are pushing for such matters to be treated more seriously.

When a paper on climate proxies over the last millennium by astrophysicists Dr Willie Soon and Dr Sallie Baliunas of The George C. Marshall Institute was recommended for rejection by all four peer reviewers, the publication Climate Research chose to run it anyway. The paper was widely quoted by climate change deniers, and the US Senate devoted half of a hearing to the paper alone.

Even before the Senate hearing, the paper was demolished by many of the scientists whose original research formed its basis. The flaws made by Soon and Baliunas were so numerous they were easily picked up, and the paper only ran because an editor supported its conclusions. Chief Editor Prof Hans von Storch attempted to change Climate Research’s review process to prevent a repetition but when the other editors refused, von Storch and half the editorial board resigned.

Open Review
In 2006 Nature ran a series of articles online considering ways to improve the peer review process. In one of these papers Jennings, a former Nature editor, noted: “Given its importance in steering the global research enterprise, peer review seems understudied”.

One possibility that Nature experimented with was called “open peer review”. In this process, papers being considered for publication were run online while traditional peer review was conducted. Researchers in the field had the opportunity to comment on the papers prior to publication. Registrants at nature.com and scientists whom the editors thought might have a particular interest were alerted to a paper’s presence.

Most papers received few comments online; almost half received none. The editors were polled about how useful they found the online comments, but in no case did they alter their decision as to whether to publish; most were considered of little value.

Although participating authors generally described the trial as an “interesting experiment”, few felt that the comments were of value and some expressed concern about rivals scooping their work as a result of its premature release.

Nature has opted not to implement open peer review on a wider scale. However some less famous journals, such as Electronic Transactions on Artificial Intelligence, continue to use it.

If open peer review fails for lack of reviewer enthusiasm, what is the alternative? Anderson advocates more transparency as to how peer review is done. “We could improve peer review immensely simply by describing it in the same factual way we describe studies – with qualifiers like double-blind, randomised, placebo-controlled – to differentiate how various practitioners accomplish it. Then, we can better assess how well it’s being done, once we know which aspects of the process are being used.”

Anderson has suggested a checklist of information he’d like to see journals provide to let readers and scientists submitting papers know how things work there. This might make things easier for scientists, and Anderson argues that it would also expose cases where publications used only one reviewer or provided reviewers with information that might cloud their judgement. Potentially this could ensure that papers published in such journals are taken less seriously and encourage journals to raise their standards.

Nevertheless, it’s not clear if this will address the problem of time-poor reviewers accidently failing to pick up important errors.

In 2009 Nature Chemistry pondered the same problem in an editorial suggesting: “Perhaps a hybrid system could be the solution. Traditional peer review, and a decision to publish, could be followed by a fixed period in which any interested party could post questions or comments and the authors are given the opportunity to respond – all moderated by an editor – before a final version of the article (including comments and responses) is preserved for the record.”

However, the editorial notes: “This would again require a large change in the habits of the community – authors, reviewers and publishers – and previous experiments with commenting on published papers have been far from conclusive.”

If there is a dramatic solution to this substantial challenge for scientific process, it seems it has not yet been found.

Box: Needling the Truth
A paper published in Nature Neuroscience has been widely criticised for failing the standards expected of such an august journal. The paper discussed the release of the painkilling molecule adenosine when acupuncture needles were applied to mice and rotated behind the knee at what, in humans, acupuncturists call the Zusanli point.

The discovery of when adenosine is and is not released is certainly a significant and interesting piece of research. However, the paper explicitly linked the findings to acupuncture without conducting any control studies, such as whether the same levels of adenosine were released when needles were applied away from acupuncture locations.

Research into acupuncture has produced contradictory results, with some studies finding no effect beyond a placebo and others showing differences between needles placed at acupuncture points and those located at random (AS, April 2010, pp.21–23). Nevertheless, one would expect such a live and relevant debate would be cited in a scientific paper. Instead the authors cited the fact that acupuncture is tax-deductible in the US as evidence for its effectiveness.

Mice are also smaller than humans, and critics note that the capacity of adenosine to spread a few millimetres in a mouse does not prove that it can travel the far greater distances required in humans to provide the benefits the paper attributed to it.

Box: Other Issues with Peer Review
Flaws in the current system of scientific publication go well beyond the acceptance of inadequate papers.

The British House of Commons report published a comment about gender bias by Prof Teresa Rees, former Pro-Vice-Chancellor at Cardiff University: “Evidence from the States suggests that if John Mackay or Jean Mackay submits an article it will be peer reviewed more favourably if it is by John Mackay. There is a whole series of papers to that effect.” Unsurprisingly other research suggests that work by well established scientists is treated more favourably than that from new figures on the scene.

Double-blind marking, where reviewers are not told the name of the authors, has been suggested as a way to address these problems. However, in small fields it is often possible for reviewers to guess the identity of authors without being told, and research into the benefits of authorial masking has produced contradictory results.

Peer review is also not always effective at screening out plagiarism. Australasian Science has seen evidence of published papers that bear a striking resemblance to the work of others, without acknowledgement.

An interesting twist on this problem comes in the form of self-plagiarism. With promotions and funding highly dependent on rates of publication, some scientists speak of cutting their work into “minimal publishable units”. In it’s extreme form this can lead to a team publishing almost the same work several times in different journals to fatten their CV.

Clearly such activities are unethical, and if they lead to undeserved access to limited funds they may have a long-term deleterious impact on scientific progress. However, it is debatable whether it is the responsibility of reviewers to detect either plagiarism or self-plagiarism, and whether blame for its occurrence should really be laid at the door of peer review itself.