New paper: Exploring collaborative practices in archaeological software development

I’m happy to announce that Joe Roe and I just published a paper in Internet Archaeology that explores collaborative practices in archaeological open source software development. This paper has been in development for a while, and we’re glad to finally release our work.

To briefly summarize: we investigated the under-explored practices involved in research software engineering in archaeology, with an emphasis on collaborative experiences involved in open source software development. We identified not only what kinds of software archaeologists are making, but how archaeologists create these tools as part of a broader community of practice. We conducted exploratory data analysis and network analysis on data from open-archaeo, supplemented with additional data pulled from the GitHub API, to trace how archaeologists use various languages, forges, licenses and supporting features (e.g. issues, stars, pull requests), and to discern trends regarding projects’ longevity, degree of community participation, and overall structure of collaborative ties.

Open archaeology, open source? Collaborative practices in an emerging community of archaeological software engineers

Surveying the first quarter-century of computer applications in archaeology, Scollar (1999) lamented that the field relied almost exclusively on “hand-me-down” tools repurposed from other disciplines. Twenty five years later, this is no longer the case: computational archaeologists often find themselves practicing the dual roles of data analyst and research software engineer (Baxter et al. 2012; Schmidt and Marwick 2020), developing and applying new tools that are tailored specifically to archaeological problems and archaeological methods. Though this trend can be traced to the very earliest days of the field (Cowgill 1967), its most recent manifestation is distinguished by its apparent embrace of practices from free and open source software. Most prominently, since around 2015, there has been a rapid uptake of workflow tools designed for open source development communities, such as the version control system git and associated online source code management platforms (e.g. GitHub, GitLab). These tools facilitate collaboration among developers and users of open source software using patterns that can diverge quite radically from conventional scholarly norms (Tennant et al. 2020).

In this paper, we investigate modes of collaboration in this emerging community of practice using ‘open-archaeo’, a curated list of archaeological software, and data on the activity of associated GitHub repositories and users. We conduct an exploratory quantitative analysis to characterize the nature and intensity of these collaborations and map the collaborative networks that emerge from them. We document uneven adoption of open source collaborative practices beyond the basic use of git as a version control system and GitHub to host source code. Most projects do make use of collaborative features and, through shared contributions, we can can trace a collaborative network that includes the majority of archaeologists active on GitHub. However, a majority of repositories have 1–3 contributors, with only a few projects distinguished by an active and diverse developer base. Direct collaboration on code or other repository content—as opposed to the more passive, social media-style interaction that GitHub supports—remains very limited. In other words, there is little evidence that archaeologists’ adoption of open source tools (git and GitHub) has been accompanied by the decentralized, participatory forms of collaboration that characterise other open source communities. On the contrary, our results indicate that research software engineering in archaeology remains largely embedded in conventional professional norms and organizational structures of academia.

ArcheoFOSS XVII

This week I participated in ArcheoFOSS in Turin, Italy. I’ve always been keen to present at this conference but somehow never really felt I had much important to say (aside from open-archaeo stuff, but Joe Roe and I already presented about it at the 2021 CAA conference, and more detailed analysis is still in the works). But this year Joe and I took the opportunity to co-lead a panel on archaeology and the fediverse based on our experiences administrating and moderating the archaeo-social mastodon instance. Our panel was meant to highlight key challenges and opportunities for collectively-owned and community-led scholarly social media, and while it only consisted of a few papers, it definitely got the ball rolling on further critical discussion regarding the role of the fediverse and decentralized communication protocols in online archaeological discourse. Joe and I are initiating work on a position paper that assembles the main ideas presented during the panel and subsequent discussion, so stay tuned for more on that. In the meantime, you can access our introductory remarks on zenodo and github.

I also presented a paper on the challenges I experienced integrating and reusing data during my Master’s thesis, which I completed 8 years ago, basically summarizing its failures (trying to channel Shawn Graham’s Failing Gloriously). This basically served as a venue for finally presenting my long-held yet unpublished uncertainties about the value of analyses that integrate legacy data, drawn from my personal experiences.

I really appreciated how low-key and relaxing the conference was. It was great to just have a casual experience with a relatively small group of like-minded researchers. I was very fortunate to be able to travel to Turin and participate in person. I also went on a nice post-conference excursion to Genoa, which is a truly lovely city. Thanks to Stefano Costa for informing me about the best places to visit and eat!

Mole Antonelliana, Turin
Po River, Turin
Po River, Turin
Sunset in Genoa

Finished my dissertation!

I finally defended my doctoral dissertation a few weeks ago, and after 7 years I’m happy to put it out into the world: https://doi.org/10.5281/zenodo.8373390

To briefly summarize: I observed and interviewed archaeologists while they worked, focusing on how they collaborate to produce information commons within relatively small, bounded communities. I relate these observations to issues experienced when sharing data globally on the web using open data platforms. This is part of an effort to reorient data sharing (and other aspects of open science) as a social, collaborative, communicative, and commensal experience.

Many thanks to my supervisor, Costis Dallas, for being such a great mentor, and to Matt Ratto and Ted Banning for their constant constructive feedback. And special thanks to the external examiners, Jeremy Huggett and Ed Swenson, for critically engaging with my work.

Archaeological data work as continuous and collaborative practice

This dissertation critically examines the sociotechnical structures that archaeologists rely on to coordinate their research and manage their data. I frame data as discursive media that communicate archaeological encounters, which enable archaeologists to form productive collaboration relationships. All archaeological activities involve data work, as archaeologists simultaneously account for the decisions and circumstances that framed the information they rely on to perform their own practices, while anticipating how their information outputs will be used by others in the future. All archaeological activities are therefore loci of practical epistemic convergence, where meanings are negotiated in relation to communally-held objectives.

Through observations of and interviews with archaeologists at work, and analysis of the documents they produce, I articulate how data sharing relates distributed work experiences as part of a continuum of practice. I highlight the assumptions and value regimes that underlie the social and technical structures that support productive archaeological work, and draw attention to the inseparable relationship between the management of labour and data. I also relate this discursive view of data sharing to the open data movement, and suggest that it is necessary to develop new collaborative commitments pertaining to data publication and reuse that are more in line with disciplinary norms, expectations, and value regimes.

open-archaeo data paper

Today, Joe Roe and I published a data paper in the Journal of Open Archaeology Data on open-archaeo, the comprehensive list of open source archaeological software and resources that we maintain. In this paper, we outline the data collection methods and conceptual model, and highlight open-archaeo’s value as a public resource and as a dataset for examining the emerging community of practice surrounding open source software development in research contexts. In fact, open-archaeo serves as the basis for an extended dataset in a study we are currently working on (investigating collaborative coding experiences) and we think there is a lot of potential for additional analysis in the future.

Open-archaeo: A Resource for Documenting Archaeological Software Development Practices
https://doi.org/10.5334/joad.111

Open-archaeo (https://open-archaeo.info) is a comprehensive list of open software and resources created by and for archaeologists. It is a living collection—itself an open project—which as of writing includes 548 entries and associated metadata. Open-archaeo documents what kinds of software and resources archaeologists have produced, enabling further investigation of research software engineering and digital peer-production practices in the discipline, both under-explored aspects of archaeological research practice.

Conceptual model documenting relationships between data recorded in open-archaeo and other relevant information in the source material and elsewhere on the web.
Conceptual model documenting relationships between data recorded in open-archaeo and other relevant information in the source material and elsewhere on the web.

Recap: Digital Archaeology Bern 2023

Last week I travelled to Switzerland to participate in Digital Archaeology Bern (2023). The conference was themed “advancing open research into the next decade” and served as a way to take stock of developments since the 2012 World Archaeology Special Issue on Open Archaeology and Ben Marwick’s influential 2017 paper Computational Reproducibility in Archaeological Research, which came out 10 and 5 years ago, respectively. I think that the conference was a remarkable success, and all 50-60 participants were actively engaged in critical discussions on what it means to do open archaeology. You can find my slides and presentation notes on GitHub (https://github.com/zackbatist/DAB23).

Although there were some elements of this, the conference was not just superficial open-boosting. Most, if not, all participants highlighted challenges and unanticipated implications of being open that they have recently experienced. Looking back, a few themes stood out:

  • Thinking about value proposition that openness entails, which necessarily involves accounting for specific use cases and imagined future stakeholders.
  • Thinking about the needs and values of all stakeholders involved in doing archaeology, including local and Indigenous communities, land-owners, archivists, government agencies, and related parties, and what openness means for them.
  • Thinking about how we might reconcile our values as archaeologists with the values demanded and afforded by the infrastructures and communities with whom we must work.

I got to meet so many interesting people. I already knew many of them from social media, virtually-hosted talks, or brief in-person interactions at the CAA back in 2018, and it was really great to put a face to each person’s name. Most serious work in digital archaeology, especially productive work developing open data infrastructures, is being done in Europe, and I was very grateful to have this opportunity to connect with that crowd (especially since I’m currently entering the post-PhD academic job market). I think my paper was well-received and valued, and it opened the door to many interesting discussions during the breaks between sessions and elsewhere.

I was also able to tack on a couple days at the start to work with Joe Roe on an article we’ve been writing for the better part of 3 years, about collaborative aspects of open source software development among archaeologists. We presented a paper at the 2021 CAA conference on the composition of open-archaeo, the list of open source software and resources made by and for archaeology that I maintain, and we’re trying to expand on it a little bit more with some network analysis type stuff. So this time together really gave us an opportunity to discuss what we really want out of the paper, to actually talk through the results, and generally helped motivate us to get this done. We still have some work cut out for us, but that probably warrants its own blog post.

Anyway, here are some cool pictures from the trip!

Abstract submitted for DAB23 (Bern, Switzerland)

Today I submitted an abstract to present at the DAB23 colloquium hosted by the Bern Computational and Digital Archaeology lab. The conference is about “advancing open research into the next decade” and my paper is titled Documenting the collaborative commitments that support data sharing within archaeological project collectives. Here is the abstract:

Archaeological research is inherently collaborative, in that it involves many people coming together to examine a material assemblage of mutual interest by implementing a variety of tools and methods in tandem. Independent projects establish organizational structures and information systems to help coordinate labour and pool information derived thereof into a communal data stream, which can then be applied towards the production and publication of analytical findings. Albeit not necessarily egalitarian, and with different expectations set for people assigned different roles, archaeological projects thus constitute a form of commons, whereby participants contribute to and obtain value from a collective endeavour. Adopting open research practices, including sharing data beyond a project’s original scope, involves altering the collaborative commitments that bind work together. This paper, drawn from my doctoral dissertation, examines how archaeologists are presently navigating this juncture between established professional norms and expectations on the one hand, and the potential benefits and limitations afforded by open research on the other.

I applied an abductive qualitative data analysis approach based on recorded observations, interviews, and documents collected from three cases, including two independent archaeological projects and one regional data sharing consortium with limited scope and targeted research objectives. My analysis documents a few underappreciated aspects of archaeological projects’ sociotechnical arrangements that open data infrastructures should account for more thoroughly:

  1. boundaries, whether they restrict membership within a collective, delimit a project’s scope, or limit the time frame under which a project operates, have practical positive value, and are not just arbitrary impediments;
  2. systems designed to direct the flow of information do so via the coordination of labour, and the strategic arrangement of human and object agency, as well as resistances against such managerial control, are rarely accounted for in data documentation; and
  3. information systems and the institutional structures that support them tend to reinforce and reify existing power structures and divisions of labour, including implicit rules that govern ownership and control over research materials and that designate who may benefit from their use.

By framing data sharing, whether it occurs between close colleagues or as mediated by open data platforms among strangers, as comprising a series of collaborative commitments, my work highlights the broader social contexts within which we develop open archaeological research infrastructures. As we move forward, we should be aware of and account for how the data governance models embedded within open research infrastructures either complement or challenge existing social dynamics.