Comments on “The rise and fall of peer review”

A substack post about peer review is getting a lot of attention, and I’m here to rant about it. Basically the post is calling out the peer review process as a terrible and broken system. And it is. But the author’s rhetoric about it is kind of problematic.

1. Peer review is not an experiment.

The author claims that it is, but contradicts himself straight away:

The experimental design wasn’t great; there was no randomization and no control group. Nobody was in charge, exactly, and nobody was really taking consistent measurements. And yet it was the most massive experiment ever run, and it included every scientist on Earth.

These are not just things that make an experiment bad, they are things that preclude peer review from being an experiment altogether. Experiments are run on samples, they are run with intent, they are performed in controlled environments. As someone who calls himself an experimental psychologist and who calls his blog “experimental history”, he really extends the term experiment in weird ways. This use of the term is like referring to the “experiment of democracy”, basically just grand rhetoric for “we’re figuring things out and learning as we go”.

The author also seems to think of the experiment of peer review as a pass/fail test, which again, is not what experiments are for. He sets bars for what successful scientific evaluation ought to look like, and measures his experiences of peer review against it. But this is not an experiment, this is qualitative assessment. There’s nothing wrong with that, but it’s troubling how the author wraps his proclamation that peer review is bad and should be abolished within some phony hearkening to science-core.

2. General discourse is not enough to validate truth statements.

Various parts of the post indicate that the author considers science to be the evaluation of statements of truth, which can only be verified by their fidelity to observed reality. Ok, fair enough. But he refers to Einstein’s large body of non-reviewed work as an argument for relying on discourse among educated fellows as an efficient way of evaluating the quality of scientific work. Despite Einstein’s apparent genius, which is cemented in popular imagination but who also happened to be wrong about some things, this is not reason enough to abolish peer review. Moreover, the author does not consider the general acceptance of non-reviewed ideas that happened to be wrong as a counter point that clearly refutes his main point.

3. What about non-experimental methods?

The author has a huge blind spot for non-experimental methods. He suggests that if the results of a scientific analysis can be replicated, then that is good enough for acceptance into an authoritative cannon of truth. Moreover, he indicates that work that can not replicate is “a whole lotta money for nothing”, basically a waste of time and resources. But a lot of science can not be replicated, by virtue of the fact that science doesn’t always follow experimental protocols that allow for replication tests to be performed. Fantastic and valuable work that relies on non-experimental heuristics, including a lot of work in the social sciences and humanities, climate science, ecology, astronomy and various other fields, are left in the lurch. His take on non-replicability in these disciplines reads a lot like the unethical and ironically non-replicable Sokal hoaxes that serve as the basis for unhinged right-wing attacks on the social sciences and humanities.

This also contradicts the author’s hearkening to discourse among learned men of olde as a way of dealing with problems relating to peer review. Opening up the comments section, even if just limited to a curated list of credentialed scholars, is not the same as conducting independent replication studies. I think the reasoning behind this link is that if other people have experienced similar phenomena in their own labs, then it’s more likely to be accepted as true. But this is not the same as replication under the same conditions, it is just the same uncontrolled consensus-based evaluation criteria as peer review but with an open filter.

4. Peer-review in context

I agree with many of the things that the author is saying. Yes, there are many ways in which peer review is broken and could be improved. For instance, I agree with the notion that peer reviewers do not dive deep enough into the data and aren’t always critical enough. But I think that this is because most people are unprepared to do so, either because they do not have access to data or do not know how to work with statistics or read code. Moreover, certain journals like PNAS give preferential treatment to certain authors over others, and there are definitely major issues with racism and sexism in the evaluation process. Open peer review does not resolve these issues, namely because it treats peer review in isolation.

The only way to make peer review better is by instilling good scholarly practices in the next generation of scholars. However, this is inhibited by structural issues, such as the tight job market that favours quantity of peer reviewed articles over any other factor, and the general prestige economy of academia. These are the root issues. The foul state of peer review is one aspect of this mess, alongside structural racism, sexism and transphobia, the sheer expense of obtaining an advanced degree and excelling in the years immediately post-PhD, and the pressures to conform trends that get you funding. You can not separate the problems with peer review from these issues. Yet somehow the author manages to completely side step these concerns, identifying the broken peer review system as a purely epistemic problem, rather than a problem with tangible and far-reaching social implications.

Comments on a recent “science mapping” paper

A new paper examining published research outputs to describe the makeup of archaeology as a discipline just dropped, and it’s getting a lot of positive attention.

Sinclair, A. 2022 Archaeological Research 2014 to 2021: an examination of its intellectual base, collaborative networks and conceptual language using science maps, Internet Archaeology 59. https://doi.org/10.11141/ia.59.10

I see some issues with the paper that I think are worth addressing. This is not a comprehensive review, more like a commentary based on my own interests and experiences. I welcome dialog with the author and anyone else who is interested in discussing this further.

I’m a bit hesitant to post this because I do not know the author, Anthony Sinclair, and I don’t want to come across as too harsh. I intentionally did not look him up prior to writing this post. This is a commentary of the paper, not the person behind it.

Simplistic description of network graphs

My first criticism is about the surface-level description of the network visualizations. Network visualizations are one of many ways of rendering a dataset, and this would have really benefited from more multifaceted statistical analysis of the underlying data. For example, it would have been nice to see the distribution of nodes with different degrees of centrality compared against some other variable, such as gender. The author reverts back to a plain and simple citation count in his analysis of gender disparities, and misses a great opportunity to draw upon centrality measurements as a key indicator of inequitable aspects of professional development across the genders.

The author also annotated the graphs with diagrams that look kind of like a compass rose. I only found one instance in the text describing them and their function:

“In certain maps, the key dimensions that affect the layout of the maps are identified in one of the upper corners of the map.”

One of the network visualizations from the original paper. Note the compass rose in the top left corner.

What do these compass roses actually represent? Are they derived from the author’s interpretations, or are they derived from the dataset? This is unclear. In either case, I would have liked to understand the reasoning or approach for identifying the extremes at each end of the gradients, and how a node’s situation along the scale is determined.

Framing of science and non-science

This paper perpetuates an outdated dichotomy between science and the arts and humanities. It never really defined either of these things, or attempts to reconcile the terms used by the citations databases against their own notion of what science and arts and humanities means to them. But these terms appear in the compass roses and in their descriptions of the graph visualizations as if their meanings are self-evident.

Also very interesting is they say a lot about science but not much about arts and humanities. In fact, it may be more apt to say that this paper describes science and non-science, rather than some alternative other cohesive entity. The author describe journals, topics and methods that they identify as scientific, but do not do this at all for entities that they relate to as belonging to the arts and humanities. The lack of distinction between these terms reveals a lack of willingness to treat the things they represent as things in themselves rather than a lack of something, namely, science.

The paper also relies on really outdated visions of the character of various disciplines and of archaeology specifically. As far as I can tell, it relies on two sources:

  • Pantin, C.F.A. 1968 The Relations Between the Sciences, Cambridge: Cambridge University Press.
  • Becher, T. and Trowler, P.R. 2001 Academic Tribes and Territories, 2nd Edition, Maidenhead: Society for Research into Higher Education/Open University Press.

Becher is extremely outdated and falls within a period when scientists (especially social scientists, including Binford) were aching to make their disciplines seem more scientific. So there is a strange value judgement at play, and they often failed to capture the reality of how science actually works. The other source is mentioned only very briefly in passing, but follows a similar essentialist rhetoric regarding the fundamental nature of specific disciplines, which rubs me the wrong way. A lot of excellent work that examines the pragmatic reality of scientific practice, which highlights contradictions and misrepresentations, and that presents the fluidity across disciplines rather than hard distinctions, is simply ignored (e.g. Latour and Woolgar’s Laboratory Life, Latour’s Pandora’s Hope, Knorr-Cetina’s Epistemic Cultures, Bowker’s Science on the Run, to name just a few).

Critical reflection on what the networks actually represent

The value of analyzing citation networks is unclear to me, and the author don’t really convince me that they represent a “window on the shape of the discipline”. Citation networks depict clusters of citations, but the jump to making these clusters meaningful in relation to some broader social or epistemic phenomenon is never really articulated. Moreover, the author indicates they he applied the Girvan-Newman method for identifying clusters, but doesn’t really incorporate the means through which this algorithm operates, including its limitations, into the analysis. Clusters do not simply exist, they are highlighted through some method, which impacts what we see.

After writing an initial draft I found that the author has done other relevant scientometric work with a more targeted scope:

Sinclair, A. (2020). From Specialty to Specialist: a citation analysis of Evolutionary Anthropology, Palaeolithic Archaeology and the Work of John Gowlett 1970-2018. In J. Cole, J. McNabb, M. Grove, & R. Hosfield (Eds.), Landscapes of Evolution: Studies in Honour of John Gowlett (pp. 175-201). Oxford: Archaeopress.

Although I do not have access to this paper, it is likely to be much more effective since these kinds of analyses tend to work better when a more specific objective is outlined, since it’s easier to ground the relationships within a specific set of experiences, rather than relying too much on generalizations and abstractions.

Analysis of language and push for standardized terminology

I like the analysis of language and keywords. I think it’s the strongest part of the paper, and there’s a lot of potential there. However the author draws this into a push towards standardization, which seems kind of forced and not relevant to the analysis of key words across the literature. The author frames the diverse array of terminology as a problem that needs to be overcome, rather than a very interesting aspect of archaeological research practice with its own benefits and affordances. Standards implemented in harder sciences are stated as goals worth attaining, but I’m left unconvinced that this is really worth doing based on the findings presented here.

Open science and its weird conception of data

In an early draft of one of my dissertation’s background chapters I wrote a ranty section about notions of data held by the open science movement that I find really annoying. I eventually excised this bit of text, and while it isn’t really worth assembling into any publication, I thought it may still be worth sharing here. So here is a lightly adapted version, original circa May 2022.

Continue reading “Open science and its weird conception of data”

Abstract submitted for DAB23 (Bern, Switzerland)

Today I submitted an abstract to present at the DAB23 colloquium hosted by the Bern Computational and Digital Archaeology lab. The conference is about “advancing open research into the next decade” and my paper is titled Documenting the collaborative commitments that support data sharing within archaeological project collectives. Here is the abstract:

Archaeological research is inherently collaborative, in that it involves many people coming together to examine a material assemblage of mutual interest by implementing a variety of tools and methods in tandem. Independent projects establish organizational structures and information systems to help coordinate labour and pool information derived thereof into a communal data stream, which can then be applied towards the production and publication of analytical findings. Albeit not necessarily egalitarian, and with different expectations set for people assigned different roles, archaeological projects thus constitute a form of commons, whereby participants contribute to and obtain value from a collective endeavour. Adopting open research practices, including sharing data beyond a project’s original scope, involves altering the collaborative commitments that bind work together. This paper, drawn from my doctoral dissertation, examines how archaeologists are presently navigating this juncture between established professional norms and expectations on the one hand, and the potential benefits and limitations afforded by open research on the other.

I applied an abductive qualitative data analysis approach based on recorded observations, interviews, and documents collected from three cases, including two independent archaeological projects and one regional data sharing consortium with limited scope and targeted research objectives. My analysis documents a few underappreciated aspects of archaeological projects’ sociotechnical arrangements that open data infrastructures should account for more thoroughly:

  1. boundaries, whether they restrict membership within a collective, delimit a project’s scope, or limit the time frame under which a project operates, have practical positive value, and are not just arbitrary impediments;
  2. systems designed to direct the flow of information do so via the coordination of labour, and the strategic arrangement of human and object agency, as well as resistances against such managerial control, are rarely accounted for in data documentation; and
  3. information systems and the institutional structures that support them tend to reinforce and reify existing power structures and divisions of labour, including implicit rules that govern ownership and control over research materials and that designate who may benefit from their use.

By framing data sharing, whether it occurs between close colleagues or as mediated by open data platforms among strangers, as comprising a series of collaborative commitments, my work highlights the broader social contexts within which we develop open archaeological research infrastructures. As we move forward, we should be aware of and account for how the data governance models embedded within open research infrastructures either complement or challenge existing social dynamics.

Mastodon and the potential for community growth

It’s a weird time to be starting a blog. Twitter has imploded and many of my colleagues have started using mastodon instead. There’s a lot of talk about the virtues of decentralization and of establishing and maintaining firm community values as reflected in content moderation policies and practices. Of course, all of this discourse is happening in microblog format, and is restricted by the usual inability to have any kind of nuanced conversation on the web. I feel that when posting on twitter and on mastodon alike, I’m bound to a formal position, and I find it hard to establish a tone that is my own. This makes it difficult for me to be casual, and to express my thoughts in a way that makes sense to me, especially when my ideas are really half-baked or vaguely critical. So I started this blog to help me retain a more tentative voice that I often express in casual conversations, and which I’m terrified of letting out in more formal or professional spaces.

The shift to mastodon has been interesting. It definitely has a very different vibe, but there’s a chance that this might just be due to the novelty of the experience. Sure, there are affordances built in to the platform that enable or encourage certain behaviours, such as content warnings, image descriptions and various means of controlling post visibility, but the value of these features will depend on whether people take action and actually use them.

I think that the biggest change, whose ramifications we’re just starting to see, has to do with community governance. On twitter, the usual and pretty much only way of responding to inadequate content moderation was to complain and put up with it. But on mastodon there are three main ways you can deal with it:

  1. put up with it,
  2. switch to another instance, or
  3. get involved, give feedback, make change

People are very used to the first option, and the latter two require more work. The second option involves a bit of work to find another instance that appeals to you, to create a new introduction post and build out your profile again, and to re-link all your other socials, etc. The third option seems like the most exciting one, since it actually feels like a potential venue for dynamic community building, for personal and collective growth. The distinction between the second and third options may also have lot to do with a weird tension between techno-libertarian  and anarcho-syndicalist visions of (web-based) community building (but more on that in another post…).

This is the sort of thing that is on my mind as archaeo.social continues to develop. Joe Roe started archaeo.social as a mastodon instance for archaeologists, and I joined him soon after to help with content moderation and to plan some community guidelines (still in progress). I’m learning a lot through this whole experience. I’m learning to be more patient, more open to other perspectives, less controlling, and less apprehensive. It’s still early days, and Joe is encouraging me to sit tight, let the community do its thing, get them to shape the path ahead, which scares the hell out of me. Hours before archaeo.social launched, I even posted a very critical toot about how this would be bound to fail, but look at me now, riding shotgun!

Screenshot of two posts.First post: "Kinda concerned by all the rhetoric about mastodon vibes being inherently more positive than twitter, just because *federated*. Sure, it provides certain tools that provide greater *potential* for community members to become more involved in how their social network is governed, but I think these are community issues, not tech issues, and it risks falling back into uncritical platform-hype circa 2007." Second post: "Related: I think that academic units hosting their own units is a bad idea because I think that most unis have really bad governance strategies in general, and will be totally unequipped to deal with community management and content moderation if they just do it on a whim. Same goes for individuals wanting to set up an instance for their colleagues or subdiscipline"
Me being wrong on the internet, hopefully.

In retrospect, that kind of attitude I posted a couple weeks ago may be what’s holding us back. We need to try things out and play around to find out what else could come from all of this. I’m very eager to have been wrong.