Comments on “The rise and fall of peer review”

A substack post about peer review is getting a lot of attention, and I’m here to rant about it. Basically the post is calling out the peer review process as a terrible and broken system. And it is. But the author’s rhetoric about it is kind of problematic.

1. Peer review is not an experiment.

The author claims that it is, but contradicts himself straight away:

The experimental design wasn’t great; there was no randomization and no control group. Nobody was in charge, exactly, and nobody was really taking consistent measurements. And yet it was the most massive experiment ever run, and it included every scientist on Earth.

These are not just things that make an experiment bad, they are things that preclude peer review from being an experiment altogether. Experiments are run on samples, they are run with intent, they are performed in controlled environments. As someone who calls himself an experimental psychologist and who calls his blog “experimental history”, he really extends the term experiment in weird ways. This use of the term is like referring to the “experiment of democracy”, basically just grand rhetoric for “we’re figuring things out and learning as we go”.

The author also seems to think of the experiment of peer review as a pass/fail test, which again, is not what experiments are for. He sets bars for what successful scientific evaluation ought to look like, and measures his experiences of peer review against it. But this is not an experiment, this is qualitative assessment. There’s nothing wrong with that, but it’s troubling how the author wraps his proclamation that peer review is bad and should be abolished within some phony hearkening to science-core.

2. General discourse is not enough to validate truth statements.

Various parts of the post indicate that the author considers science to be the evaluation of statements of truth, which can only be verified by their fidelity to observed reality. Ok, fair enough. But he refers to Einstein’s large body of non-reviewed work as an argument for relying on discourse among educated fellows as an efficient way of evaluating the quality of scientific work. Despite Einstein’s apparent genius, which is cemented in popular imagination but who also happened to be wrong about some things, this is not reason enough to abolish peer review. Moreover, the author does not consider the general acceptance of non-reviewed ideas that happened to be wrong as a counter point that clearly refutes his main point.

3. What about non-experimental methods?

The author has a huge blind spot for non-experimental methods. He suggests that if the results of a scientific analysis can be replicated, then that is good enough for acceptance into an authoritative cannon of truth. Moreover, he indicates that work that can not replicate is “a whole lotta money for nothing”, basically a waste of time and resources. But a lot of science can not be replicated, by virtue of the fact that science doesn’t always follow experimental protocols that allow for replication tests to be performed. Fantastic and valuable work that relies on non-experimental heuristics, including a lot of work in the social sciences and humanities, climate science, ecology, astronomy and various other fields, are left in the lurch. His take on non-replicability in these disciplines reads a lot like the unethical and ironically non-replicable Sokal hoaxes that serve as the basis for unhinged right-wing attacks on the social sciences and humanities.

This also contradicts the author’s hearkening to discourse among learned men of olde as a way of dealing with problems relating to peer review. Opening up the comments section, even if just limited to a curated list of credentialed scholars, is not the same as conducting independent replication studies. I think the reasoning behind this link is that if other people have experienced similar phenomena in their own labs, then it’s more likely to be accepted as true. But this is not the same as replication under the same conditions, it is just the same uncontrolled consensus-based evaluation criteria as peer review but with an open filter.

4. Peer-review in context

I agree with many of the things that the author is saying. Yes, there are many ways in which peer review is broken and could be improved. For instance, I agree with the notion that peer reviewers do not dive deep enough into the data and aren’t always critical enough. But I think that this is because most people are unprepared to do so, either because they do not have access to data or do not know how to work with statistics or read code. Moreover, certain journals like PNAS give preferential treatment to certain authors over others, and there are definitely major issues with racism and sexism in the evaluation process. Open peer review does not resolve these issues, namely because it treats peer review in isolation.

The only way to make peer review better is by instilling good scholarly practices in the next generation of scholars. However, this is inhibited by structural issues, such as the tight job market that favours quantity of peer reviewed articles over any other factor, and the general prestige economy of academia. These are the root issues. The foul state of peer review is one aspect of this mess, alongside structural racism, sexism and transphobia, the sheer expense of obtaining an advanced degree and excelling in the years immediately post-PhD, and the pressures to conform trends that get you funding. You can not separate the problems with peer review from these issues. Yet somehow the author manages to completely side step these concerns, identifying the broken peer review system as a purely epistemic problem, rather than a problem with tangible and far-reaching social implications.

Comments on a recent “science mapping” paper

A new paper examining published research outputs to describe the makeup of archaeology as a discipline just dropped, and it’s getting a lot of positive attention.

Sinclair, A. 2022 Archaeological Research 2014 to 2021: an examination of its intellectual base, collaborative networks and conceptual language using science maps, Internet Archaeology 59.

I see some issues with the paper that I think are worth addressing. This is not a comprehensive review, more like a commentary based on my own interests and experiences. I welcome dialog with the author and anyone else who is interested in discussing this further.

I’m a bit hesitant to post this because I do not know the author, Anthony Sinclair, and I don’t want to come across as too harsh. I intentionally did not look him up prior to writing this post. This is a commentary of the paper, not the person behind it.

Simplistic description of network graphs

My first criticism is about the surface-level description of the network visualizations. Network visualizations are one of many ways of rendering a dataset, and this would have really benefited from more multifaceted statistical analysis of the underlying data. For example, it would have been nice to see the distribution of nodes with different degrees of centrality compared against some other variable, such as gender. The author reverts back to a plain and simple citation count in his analysis of gender disparities, and misses a great opportunity to draw upon centrality measurements as a key indicator of inequitable aspects of professional development across the genders.

The author also annotated the graphs with diagrams that look kind of like a compass rose. I only found one instance in the text describing them and their function:

“In certain maps, the key dimensions that affect the layout of the maps are identified in one of the upper corners of the map.”

One of the network visualizations from the original paper. Note the compass rose in the top left corner.

What do these compass roses actually represent? Are they derived from the author’s interpretations, or are they derived from the dataset? This is unclear. In either case, I would have liked to understand the reasoning or approach for identifying the extremes at each end of the gradients, and how a node’s situation along the scale is determined.

Framing of science and non-science

This paper perpetuates an outdated dichotomy between science and the arts and humanities. It never really defined either of these things, or attempts to reconcile the terms used by the citations databases against their own notion of what science and arts and humanities means to them. But these terms appear in the compass roses and in their descriptions of the graph visualizations as if their meanings are self-evident.

Also very interesting is they say a lot about science but not much about arts and humanities. In fact, it may be more apt to say that this paper describes science and non-science, rather than some alternative other cohesive entity. The author describe journals, topics and methods that they identify as scientific, but do not do this at all for entities that they relate to as belonging to the arts and humanities. The lack of distinction between these terms reveals a lack of willingness to treat the things they represent as things in themselves rather than a lack of something, namely, science.

The paper also relies on really outdated visions of the character of various disciplines and of archaeology specifically. As far as I can tell, it relies on two sources:

  • Pantin, C.F.A. 1968 The Relations Between the Sciences, Cambridge: Cambridge University Press.
  • Becher, T. and Trowler, P.R. 2001 Academic Tribes and Territories, 2nd Edition, Maidenhead: Society for Research into Higher Education/Open University Press.

Becher is extremely outdated and falls within a period when scientists (especially social scientists, including Binford) were aching to make their disciplines seem more scientific. So there is a strange value judgement at play, and they often failed to capture the reality of how science actually works. The other source is mentioned only very briefly in passing, but follows a similar essentialist rhetoric regarding the fundamental nature of specific disciplines, which rubs me the wrong way. A lot of excellent work that examines the pragmatic reality of scientific practice, which highlights contradictions and misrepresentations, and that presents the fluidity across disciplines rather than hard distinctions, is simply ignored (e.g. Latour and Woolgar’s Laboratory Life, Latour’s Pandora’s Hope, Knorr-Cetina’s Epistemic Cultures, Bowker’s Science on the Run, to name just a few).

Critical reflection on what the networks actually represent

The value of analyzing citation networks is unclear to me, and the author don’t really convince me that they represent a “window on the shape of the discipline”. Citation networks depict clusters of citations, but the jump to making these clusters meaningful in relation to some broader social or epistemic phenomenon is never really articulated. Moreover, the author indicates they he applied the Girvan-Newman method for identifying clusters, but doesn’t really incorporate the means through which this algorithm operates, including its limitations, into the analysis. Clusters do not simply exist, they are highlighted through some method, which impacts what we see.

After writing an initial draft I found that the author has done other relevant scientometric work with a more targeted scope:

Sinclair, A. (2020). From Specialty to Specialist: a citation analysis of Evolutionary Anthropology, Palaeolithic Archaeology and the Work of John Gowlett 1970-2018. In J. Cole, J. McNabb, M. Grove, & R. Hosfield (Eds.), Landscapes of Evolution: Studies in Honour of John Gowlett (pp. 175-201). Oxford: Archaeopress.

Although I do not have access to this paper, it is likely to be much more effective since these kinds of analyses tend to work better when a more specific objective is outlined, since it’s easier to ground the relationships within a specific set of experiences, rather than relying too much on generalizations and abstractions.

Analysis of language and push for standardized terminology

I like the analysis of language and keywords. I think it’s the strongest part of the paper, and there’s a lot of potential there. However the author draws this into a push towards standardization, which seems kind of forced and not relevant to the analysis of key words across the literature. The author frames the diverse array of terminology as a problem that needs to be overcome, rather than a very interesting aspect of archaeological research practice with its own benefits and affordances. Standards implemented in harder sciences are stated as goals worth attaining, but I’m left unconvinced that this is really worth doing based on the findings presented here.