Finished my dissertation!

I finally defended my doctoral dissertation a few weeks ago, and after 7 years I’m happy to put it out into the world: https://doi.org/10.5281/zenodo.8373390

To briefly summarize: I observed and interviewed archaeologists while they worked, focusing on how they collaborate to produce information commons within relatively small, bounded communities. I relate these observations to issues experienced when sharing data globally on the web using open data platforms. This is part of an effort to reorient data sharing (and other aspects of open science) as a social, collaborative, communicative, and commensal experience.

Many thanks to my supervisor, Costis Dallas, for being such a great mentor, and to Matt Ratto and Ted Banning for their constant constructive feedback. And special thanks to the external examiners, Jeremy Huggett and Ed Swenson, for critically engaging with my work.

Archaeological data work as continuous and collaborative practice

This dissertation critically examines the sociotechnical structures that archaeologists rely on to coordinate their research and manage their data. I frame data as discursive media that communicate archaeological encounters, which enable archaeologists to form productive collaboration relationships. All archaeological activities involve data work, as archaeologists simultaneously account for the decisions and circumstances that framed the information they rely on to perform their own practices, while anticipating how their information outputs will be used by others in the future. All archaeological activities are therefore loci of practical epistemic convergence, where meanings are negotiated in relation to communally-held objectives.

Through observations of and interviews with archaeologists at work, and analysis of the documents they produce, I articulate how data sharing relates distributed work experiences as part of a continuum of practice. I highlight the assumptions and value regimes that underlie the social and technical structures that support productive archaeological work, and draw attention to the inseparable relationship between the management of labour and data. I also relate this discursive view of data sharing to the open data movement, and suggest that it is necessary to develop new collaborative commitments pertaining to data publication and reuse that are more in line with disciplinary norms, expectations, and value regimes.

Some thoughts on data formality

I’m using this post to draw out some thoughts that I feel are coherent in my mind but I struggle to communicate in writing. The general topic is the notion of formality, and how it is expressed in data work and data records.

In a management sense, formality involves adhering to standard protocol. It involves checking all the boxes, sticking to the book, and ensuring that behaviour conforms to institutional expectations. In this way, bureaucracy is the essence of formality. By extension, formality is a means through which power is expressed, in that it binds interactions to a certain set of acceptable possibilities. In effect, formality renders individual actions in subservience to a broader system of control.

But formality is also useful. Formality reduces friction involved in transforming and transmitting information across contexts. Any application that implements a formal standard can access and transmit information according to the standard, which reduces cognitive overhead on the part of actors responsible for processing information. They relocate creative agency upstream, towards managers of data and of labour, who make decisions regarding how other actors (human and non-human actors alike) may interact with the system before they ever occur. This basically manifests itself in workflows, which are essentially disciplined ways of working directed towards targeted outcomes (I wrote about workflows in a 2021 paper and in my dissertation, which draws from Bill Caraher’s contribution to Critical Archaeology in the Digital Age, among other work he’s written on the topic). To be clear, I do not mean to imply that adopting workflows constitutes a negative act. An independent scholar may apply a workflow to help achieve their goals more effectively and efficiently, and empowers them to get the most out of the resources at their disposal. However, one of the key findings from my dissertation is that when applied in collective enterprises, they tend to genericize labour and data for the purpose of extraction and appropriation, which is understood to be an ordinary aspect of archaeological research, as is evident by how actors performing genericized labour internalize this as part of their work role.

Developing a workflow essentially entails adopting and enforcing protocols and formats, which are series of documented norms and expectations that ensure that information may be made interchangeable. Protocols are standards that dictate means of direct communication, and formats are standards that dictate how should be stored. Forms are interfaces through which information is translated from real-world experiences into standardized formats.

Formal data are information whose variables and values are arranged according to a formally-defined schema. A formal dataset comprises a series of records collated in a consistent manner, motivated by a need, desire or warrant to render them comparable. The formally-defined schema makes this potential for comparison much easier. A common means of representing formal data is through tables, which are comprised of rows and columns. Each row represents a record, and each column a variable that describes a facet of each record. The values recorded for each variable constitute observations or descriptive characterizations pertaining to the object of each record. One can therefore determine what kinds of structured observations were made about a recorded object by finding the values located at the intersection of records and variables (i.e., individual cells in a table). Each record relates to a set of variables applied to the whole set and documented in the schema.

In its most extreme, formality entails a realm of total control, where all information is collected and processed according to an all-encompassing model of the world. It is not coincidental that models are the primary outlooks through which both managers and computer systems engage with the world. It has been the dream of bureaucrats and computer scientists alike to develop such systems (see the work of Paul Otlet, Vanevar Bush, and the weirdly techno-libertarian crowd associated with structured note-taking and personal knowledge management). Nor is it a coincidence that formality is a requisite aspect of both bureaucracies and computers. Computational environments and bureaucracies both effectively capture and maintain institutional power dynamics.

In some cases, such as with text boxes, the variable may be precisely defined by the values are left open-ended. However, users are still expected to provide certain kinds of information in these fields (I remember at a conference in 2019, Isto Huvila (whose work on archaeological records management is also a great source of inspiration) referred to these as “white boxes”, which conveys their literal appearance and is a clever and ironic play on words referring to the notion of “black boxes” that hide the intricate details of a process behind an opaque connotative entity). In this sense, the standards are thus mediated by social and professional norms, but exist nonetheless. This reflects the fact that social and professional norms, standards, and expectations will never go away, they are fundamental aspects of communication and participation within communities.

The million dollar question nowadays (at least in my own mind) is how can we create information infrastructures that strike a balance between the need to transmit information succinctly between computers via the web, and the capability to share context and subtext whose significance originate in the gaps between recorded information and which gain meaning only in relation to the shared experience of communicating agents as members of a social or professional community?

Recap: Digital Archaeology Bern 2023

Last week I travelled to Switzerland to participate in Digital Archaeology Bern (2023). The conference was themed “advancing open research into the next decade” and served as a way to take stock of developments since the 2012 World Archaeology Special Issue on Open Archaeology and Ben Marwick’s influential 2017 paper Computational Reproducibility in Archaeological Research, which came out 10 and 5 years ago, respectively. I think that the conference was a remarkable success, and all 50-60 participants were actively engaged in critical discussions on what it means to do open archaeology. You can find my slides and presentation notes on GitHub (https://github.com/zackbatist/DAB23).

Although there were some elements of this, the conference was not just superficial open-boosting. Most, if not, all participants highlighted challenges and unanticipated implications of being open that they have recently experienced. Looking back, a few themes stood out:

  • Thinking about value proposition that openness entails, which necessarily involves accounting for specific use cases and imagined future stakeholders.
  • Thinking about the needs and values of all stakeholders involved in doing archaeology, including local and Indigenous communities, land-owners, archivists, government agencies, and related parties, and what openness means for them.
  • Thinking about how we might reconcile our values as archaeologists with the values demanded and afforded by the infrastructures and communities with whom we must work.

I got to meet so many interesting people. I already knew many of them from social media, virtually-hosted talks, or brief in-person interactions at the CAA back in 2018, and it was really great to put a face to each person’s name. Most serious work in digital archaeology, especially productive work developing open data infrastructures, is being done in Europe, and I was very grateful to have this opportunity to connect with that crowd (especially since I’m currently entering the post-PhD academic job market). I think my paper was well-received and valued, and it opened the door to many interesting discussions during the breaks between sessions and elsewhere.

I was also able to tack on a couple days at the start to work with Joe Roe on an article we’ve been writing for the better part of 3 years, about collaborative aspects of open source software development among archaeologists. We presented a paper at the 2021 CAA conference on the composition of open-archaeo, the list of open source software and resources made by and for archaeology that I maintain, and we’re trying to expand on it a little bit more with some network analysis type stuff. So this time together really gave us an opportunity to discuss what we really want out of the paper, to actually talk through the results, and generally helped motivate us to get this done. We still have some work cut out for us, but that probably warrants its own blog post.

Anyway, here are some cool pictures from the trip!

Open science and its weird conception of data

In an early draft of one of my dissertation’s background chapters I wrote a ranty section about notions of data held by the open science movement that I find really annoying. I eventually excised this bit of text, and while it isn’t really worth assembling into any publication, I thought it may still be worth sharing here. So here is a lightly adapted version, original circa May 2022.

Continue reading “Open science and its weird conception of data”