, ,

Can Scientific Journals be Trusted?

A life science image depicting a DNA sequence

Introduction

Everyone who is in any way involved with scientific or scholarly activity is familiar with the most important means to communicate results in science: peer-reviewed articles in scientific journals. Although long delays in publication owing to backlogs and long review periods have led to a widespread practice of pre-publication of articles and working papers in archives (such as Arxiv for computational science), and books still have status and influence in the humanities, the ‘gold standard’ in scientific work remains the peer-reviewed scientific journal publication. Authorship of a peer-reviewed article in a scientific journal is highly valued and confers scientific prestige and recognition, the currency of career advancement.

For outsiders, whether journalists or interested laypeople, the significance of peer review has also become well known. Although not every science journalist or newspaper is equally diligent, in principle science journalism will nowadays always inquire whether a claim made in a paper has been peer reviewed before splashing its results on the headline. The general attitude, both inside the sciences and for outsiders looking in, is that peer review and authorship in a publication are the standards by which we judge whether what someone is saying counts as scientific or not. In the language of philosophy, it’s both a necessary and sufficient condition for something to ‘be science’.

As the theory goes, if other scientists are willing to sign off on the legitimacy of your methods and conclusions, then there must be something to it that is good enough to count as science – everyone vouches for one another with their names, and those names are in turn attached to authorship on papers, establishing reputation. Ultimately, reputation (often) determines advancement, so the system keeps itself in check with the right incentives. In order to avoid prejudice and cliquishness, reviews are done blindly, so that authors and reviewers do not know about each other, although journal editors do. All in all, it’s an ingenious system. But does it still work?

How peer review became the norm

Although peer review of scientific journals seems obvious, it is a relatively new development. Almost none of Albert Einstein or Max Planck’s work was peer reviewed. In those days, a journal like Annalen der Physik had editors and the editors simply decided what was published. Acceptance rates tended to be very high, because there were very few scientists in almost any field, and moreover journals were more divided by language, so that if anything the shortage of authors was a greater problem than an excess of possible contributions. In fact, when Einstein was first subjected to an early (ad hoc) form of peer review in 1935, he was deeply offended.

Einstein’s angry letter. From Kennefick (2005)

Two things changed this in the postwar period. Firstly, access to university education expanded enormously – initially only in the Western world, but soon also elsewhere. This led to a meteoric rise in the number of researchers and scientists, which in turn led to a great increase in the amount of publishable material. With many more submissions, journals became more interested in separating the wheat from the chaff. They increasingly adopted the practice of peer review for this purpose, which until then had been primarily a peculiar convention of the journals of the scientific societies in the English-speaking world. Where those societies’ journals had relied on small numbers of ‘insiders’ who could provide a sanity check, the huge quantitative expansion of scientific publications required a more generalisable method.

The other major impetus for peer review was the huge expansion in government funding for scientific research, which began in the context of the war effort during WWII, but has continued in many countries long afterwards. For example, in the US the National Institutes of Health (NIH) was founded in 1948 with a huge expansion of federal funding for fundamental research in medicine, biomedicine, and healthcare. The US government of course wanted to know that it was getting value for money, and so needed a process by which to judge the returns on its investment.

Peer review went from ad hoc to regular panel assessment and then to a standardised process from about the 1960s onwards, driven particularly by the grant-giving institutions like the NIH and the National Science Foundation (NSF) and similar bodies. This was strengthened by political pressure in the United States, as American politicians came to see all the increased science funding – especially in social sciences – as potentially wasting taxpayer money on dangerous or useless ideas. Since the US came to dominate scientific funding and productivity in the Cold War and post-Cold War period, as America went, so went the rest of the developed world.

Does peer review in scientific journals work?

There are signs, however, that all is not well with peer review of scientific journal articles as a ‘gold standard’ in science. In recent years, the practice has come under sustained criticism from within the scientific community and frustrations are rife in many disciplines. Many common complaints relate to the practicalities of the process, of course. For example, as mentioned above the very long time it can take to go from article submission to publication, even without the vagaries of the ‘revise and resubmit’, in fields like economics and medicine lead authors to simply pre-publish their work prior to official review. While this does not have the same status as peer-reviewed publication, it frequently anyway invites the same kind of scientific discussion of the findings (and similar journalistic interest) as peer-reviewed work does and it is much faster, leading some to wonder what the point is of the reviewing rigmarole to begin with.

Another major practical cost is time. One study, which emphasised it was likely underestimating the actual total, found scientists spent a staggering 100 million hours on peer review in the United States alone in 2020, with an estimated salary-equivalent cost of $1.5 billion. Moreover, what this figure hides is that all of this work is performed effectively for free, from the viewpoint of the researchers in question: it is covered by their salary, of course, but that is fixed regardless of how much peer review they are willing to engage in. It is done out of professional courtesy and because of the generalised assumption that science needs it to function.

Even that has been called into question at a more fundamental level, however. As Adam Mastroianni writes in a criticism of peer review, there is not actually much empirical evidence that it does what it is often taken to do: provide a guarantee (both necessary and sufficient, remember) of the scientific status of a journal contribution. In his words: “Does peer review actually do the thing it’s supposed to do? Does it catch bad research and prevent it from being published?

It doesn’t. Scientists have run studies where they deliberately add errors to papers, send them out to reviewers, and simply count how many errors the reviewers catch. Reviewers are pretty awful at this. In this study reviewers caught 30% of the major flaws, in this study they caught 25%, and in this study they caught 29%. These were critical issues, like “the paper claims to be a randomized controlled trial but it isn’t” and “when you look at the graphs, it’s pretty clear there’s no effect” and “the authors draw conclusions that are totally unsupported by the data.” Reviewers mostly didn’t notice.”

Although Mastroianni is scathing, these results are not because peer review is foolish or the reviewers aren’t doing their job. Rather, it becomes clear that peer review is simply not set up to prevent attempts at outright fraud, and cannot catch them effectively. It is a standard that works as part of a trust-based system: a sanity check, not a flawless algorithm for scientific quality. Indeed, as is well known at this point, scientific frauds happen on a regular basis without peer review having ever prevented them; but equally, probably the overwhelming majority of published peer-reviewed works are not fraudulent, even though peer review would not have been able to stop such a publication.

Next to its slowness and lack of fraud prevention filtering, peer review has come in for criticisms for its inconsistency and arbitrariness – who knows on what basis articles are really accepted or rejected? The scientific community is full of jokes, extending to the level of memes, about the dreaded ‘reviewer no 2’ who fails to understand the basic premise of the submission or demands that it be published with their own work or favourite authors cited more prominently (I have personally experienced the latter several times in my scientific career). Accusations of bias are rife – classic studies show how articles are rejected based on the institution of origin rather than their content and how nepotism and sexism are commonplace in grant awarding. Even a randomised controlled trial with fully blind peer reviewing in medicine found that this did not actually solve the problem (in fact, the sexism study argued it made it worse).

Not everyone appreciates reviewer 2… From Peterson (2020)

With all this put together, what then does peer review do for science? This becomes harder to justify. While there is still a tendency to see peer review like democracy – the ‘least bad system’, as the cliché goes – some philosophers of science have suggested abolishing pre-publication peer review altogether. In a systematic philosophical review of peer review as an epistemic practice, in other words in assessing whether peer review overall actually helps scientific knowledge, Liam Kofi Bright and Remco Heesen come to a negative conclusion. As they put it: “Given these two facts—high (epistemic) costs and unclear benefits—we raise the question whether it might be better to abolish prepublication peer review”. Instead, they propose simply extending the current scientific ‘workaround’ that – especially in the natural and life sciences – has anyway been developed to get around the downsides of peer review: preprint publication. Except now it would no longer be a ‘preprint’, but simply a publication done exclusively on the strength of one’s own authorship and only assessed post facto by one’s colleagues, just as it was done in the days before peer review became the norm. Journals and editors can curate such publications as they like, still maintaining whatever standards they deem appropriate.

Can we rely on authorship in scientific journals? Examples from medicine

This is important because despite the importance attached to peer reviewing, in terms of a scientific career it is not peer review, but this authorship of articles that helps a researcher advance. Of course, in a peer review-based system, authorship is meant to convey a certain scientific merit by virtue of having passed muster. This is also the purpose of valuing authorship of scientific journal articles the highest, over other kinds of publication, and of encouraging scientists to contribute in this way. But if peer review isn’t really able in a systematic way to separate the wheat from the chaff, perhaps authorship itself can do so. After all, to scientists and academics, reputation is everything, and authorship is highly valued for that reason alone – it shows one has truly contributed to science in some form, a badge of merit attached to one’s own name. However, authorship in science turns out to have its own problems.

Next to peer review itself, citation metrics, impact factors, and other measures seeking to assess the value of authorship have become a major measure of scientific quality and of career advancement throughout the sciences. This has in turn produced incentives to ‘game the system’, by manipulating authorship and citations so that one will appear to have made more of an impact than one actually has, or claiming the work done by others for oneself, and so forth. In this way, authorship is as a scientific filter or measure of merit just as subject to the lack of empirical validation as peer review is.

Returning to the field of medicine, for example, a recent publication has warned of widespread authorship and citation manipulation. As the author, Stuart Macdonald, points out, this is a consequence simply of the mere fact that impact factors and other measures of authorship become a way of assessing scientific quality: “when a measure of performance becomes more valuable (and much easier to determine) than the performance itself, Goodhart’s Law decrees that effort switches from producing the performance to producing the measure of performance instead”.

As Macdonald explains with various examples, this manipulation happens through the work of editors: having very few publications but making sure they are heavily cited gave CA: A Cancer Journal for Clinicians a higher impact rating than Nature. It can also happen through institutional action, where Chinese universities gave medical researchers substantial cash bonuses for heavily cited articles (which itself of course invites manipulation). Additionally, it can be the work of authors themselves, for example in the form of ‘citation cliques’ where small numbers of authors band together and cite each other’s work consistently at the expense of everyone else. Finally, impact factors themselves automatically produce autocorrelation and destroy their own measurement value, because higher impact authors will tend to ‘graduate’ to top journals, which therefore gain in impact, and so forth; in this way, journals have become a steep pyramid of ‘impact’ whose relationship to actual research quality becomes less and less obvious with time.

Eventually, authorship as a measure becomes as vulnerable to outright fraud as peer review is. Macdonald cites the striking example of Ike Antkare. Most likely you have never heard of this person, but they are nonetheless among the highest cited authors of all time, higher than Albert Einstein himself – through the manipulation of Google Scholar, the go-to for outsiders wanting quick information on some scientific field. That Ike Antkare does not actually exist and has never existed does not in any way prevent his status as “one of the great stars in the scientific firmament“.

Ike Antkare (“I can’t care”)’s citation position in 2010 according to Scholarometer. From Labbé (2010)

In medicine specifically, Macdonald has found other widespread forms of fraudulent activity as well. One familiar one is the influence exercised by private industry on scientific publication, so that articles which are effectively PR exercises for the benefit of one or another pharmaceutical company’s commercial interests end up in peer-reviewed scientific journals. Another common practice is the horse-trading in authorship itself: besides principal investigators claiming first authorship on the work of their students – a widespread practice – there is outright buying and selling of authorship listing, which is enabled in medicine and other laboratory sciences by the common inclusion of very large lists of author names even in legitimate cases. An article in Elsevier’s International Journal of Biochemistry & Cell Biology had co-authorship sold for $14.800, because nobody really verifies the authorial process or can identify the specific contributions of any given author to the results found (even where those results are legitimate, as was the case for this article).

Conclusion

In short, neither peer review nor authorship itself can act as the kind of philosophical gold standard of science that they are often treated as: that is to say, neither of them are necessary and sufficient for establishing the scientific quality of a publication in a scientific journal. But does this mean we can’t trust anything that’s written in scientific publications, or that there is no difference between what appears in a journal and the kind of random, unsubstantiated claims that circulate on the internet or in ‘common sense’?

We need not throw out the baby with the bathwater. As Bright and Heesen suggest, in practice scientists have already discovered that the problems with peer review can be compensated for – at least in terms of the opportunity cost of doing reviewing instead of working – by the ever more popular circulation of preprints. These do not prevent the publication of potentially unjustified claims, but neither does peer review, as we have seen. The same thing goes for the manipulation of citations and authorship. Sunlight is still the best disinfectant: articles such as those cited above, that highlight these practices and identify parties involved in the abuse of scientific process, are likely to mitigate the worst effects.

Overall, the fact is that scientific practices such as peer review do not prevent deliberate fraud and manipulation precisely because most working scientists are not fraudulent. They are by and large hardworking, dedicated people spending great effort on fields that they love and working to advance human knowledge. While this renders them vulnerable to fraud in principle, it is rather a compliment to science that such fraudulent practices have had relatively little ability to corrupt the whole of academic work altogether or to prevent the advancement of knowledge. The worth scientists assign to their reputation helps in this regard, because once there is a whiff of fraud or manipulative activity, one’s career and the likelihood of being able to work in the field is seriously challenged.

Frequent collaboration is also beneficial. The more scientists work together, the more likely it is that bad actors will be identified and the more the effects of bona fide scientists will swamp those of the occasional schemer or fraud. Providing tools to enhance such collaboration and scientific reproducibility in general is why we built Nuvolos, the cloud platform that bundles all aspects of the scientific workflow into one digital working environment. To find out more, visit our website or give our free trial a go. Ultimately, science will be as good as the people working in it, and we have confidence in them.