Open science and its weird conception of data

In an early draft of one of my dissertation’s background chapters I wrote a ranty section about notions of data held by the open science movement that I find really annoying. I eventually excised this bit of text, and while it isn’t really worth assembling into any publication, I thought it may still be worth sharing here. So here is a lightly adapted version, original circa May 2022.

The past decade has seen a major push to develop information infrastructures, or “set[s] of organizational practices, technical infrastructure, and social norms that collectively provide for the smooth operation of scientific work at a distance” (Bowker et al. 2010: 102), that are specifically oriented towards facilitating data sharing and reuse among archaeologists. These efforts frequently identify as participating within the open access (OA) movement, which is a distributed grassroots campaign that encourages greater accessibility of scientific research outputs. The OA movement is in turn inspired by the free and open source software (FOSS) movement, whose goals are to ensure that anyone can run, study, modify and share software without restriction, but which is more popularly identified with collective and non-commercial software development processes guided by the aforementioned principles rather than by a pursuit of profit (Kelty et al. 2008 : 254-255; Costa 2013: 449-450). While the OA and FOSS movements are distinct in that OA largely deals with scientific practices and the outcomes of scientific research while FOSS is concerned with software development, they share common concern with inclusive, transparent and collective integration of knowledge and labour. These principles that inform both the OA and FOSS movements are indeed commendable, however the ways that they tend to envisage science and scientific knowledge production warrant critical reflection. Here I identify some concerning issues that have been incorporated into the open data infrastructures that archaeologists have begun to adopt, and which contribute to a problematic and counter-intuitive conception of archaeological data and of archaeological knowledge production in general.

Data are the material records that archaeologists produce to store, transmit and communicate meaning. They are functional in nature, meaning they are tools that archaeologists rely on to extend their understanding of phenomena of interest. In other words, data work alongside analytical methods so that archaeologists can get from one point of understanding to another. Data are produced and used through pragmatic action and necessity, and exhibit characteristics that derive from the circumstances of their creation and their intended purposes. This notion that data are the products of social and material decisions is widely accepted by scholars of scientific practice, but many academics who generate and make use of data in their day-to-day work, including archaeologists, still commonly consider data as disembodied statements about the world as it truly is and which inform more elaborate and complex ways of understanding particular phenomena (Kintigh et al. 2015: 2). This aligns with a popularly held view of science as the collective pursuit of a unified and unambiguous understanding of nature. This general vision of science considers knowledge to accumulate on a grand, global scale, and is thought to inform technological change in a way that reflects arbitrarily defined technological scales of progress. For instance, early hominins’ ability to craft stone tools is thought to have been necessary for the development of iron production, which necessarily informed the invention of steel, and eventually led humans to create electronic computers, and so on into an imagined future that conspicuously resembles the future worlds depicted by a particular cohort of post-war science fiction writers (i.e. Arthur C. Clarke’s 2001: A Space Odyssey, H.G. Wells’ World Brain, Isaac Asimov’s Foundation trilogy, Gene Roddenberry’s Star Trek, etc).

[Interestingly enough, many non-archaeologist adherents to this ideology rely on outmoded archaeological frameworks that have been discredited for decades, namely grand and deterministic histories of scientific and technological progression. They use their limited understanding of archaeology to identify “technologies out of place”, or artefacts that do not fit in with the just-so technological narratives summarized above. Things like the Antikythera mechanism, the Babylon battery, and the archaeological site of Gobleki Tepe are sources of fervent discussion that often feeds into colonial and racist pseudoarchaeological tropes regarding “lost civilizations” that seeded the remnants of their knowledge to the ancestors of indigenous peoples.]

The accumulation and assembly of data is thought to contribute to a species-level understanding of the world, which is not held by any one individual but is stored in media such as books, scientific reports, and internet-connected archives (cf. Bush 1945). These documents are perceived of as value-neutral and purely representational in nature, and are thought to be produced and maintained by scholars and librarians exhibiting the virtuous traits of scientific objectivity, openness to alternative perspectives, and the capability to critique ideas in a rational and structured manner (cf. Ettarh 2018).

This idealized vision of science is commonly illustrated through the data/information/knowledge/wisdom (DIKW) pyramid, which relates basic and synthetic ways of understanding the world. More specifically, the DIKW model states that representations of a natural truth (data) undergird more complex statements (information) which inform explanations (knowledge) and eventually contribute to intrinsic understanding (wisdom). Despite the implication of movement from one stage to the next, this model fails to account for how these transitions actually occur in practice. Moreover, the flow is assumed to be unidirectional up the pyramid towards a pinnacle of pure human thought. This resembles another popular metaphor in knowledge management, that of the oil pipeline, whereby data are presented as scarce natural resources that are harvested and gradually refined to create more stable and marketable products. Mining for data is characterized as visceral and material work that occurs close to nature, while refining and synthesizing are imagined as more mental and formulaic processes. Like the DIKW scheme, the pipeline model assumes that the starting point is a natural and free-flowing repository of truth, and that researchers must contain and channel it to give it greater value, while giving it artificial shape in the process (Huggett 2020: S8-S9; Dallas 2015: 194). Any sources of friction that impede the flow of data are considered as obstructions that must be cleared or worked around to facilitate the development of more elaborate forms of understanding (Huggett 2022: 284). At the same time, the systems engineered to channel information are meant to protect us, to ensure that we do not get swept away by the “data deluge”, unmoored and lost at sea (cf. Bevan 2015).

This drives an obsessive concern with workflows pertaining to legal and logistical issues surrounding scientific publishing that OA advocates deem problematic. Many OA advocates see publishing as the business of typesetting and copyright law, which they could render moot using automated publishing workflows and by encouraging use of open licensing agreements (cf. Foster and Deardorff 2017; Harnad 1998). Viewed as merely technical systems, these could be resolved through technical means. However academic publishing and ownership involve social arrangements that serve to stabilize knowledge, grant authority to validated claims, and enable science to move forward (Kelty et al. 2008: 274-275). In other words, technocentric visions of publication workflows tend to ignore the fact that publication is a cultural phenomenon, whereby projects are made complete and knowledge claims are articulated, credited, and rendered accountable to the people who proposed them.

The technocratic system imagined by OA advocates envisions a global web of information, whereby new forms of knowledge emerge through novel integrations (cf. Tennant et al. 2020; Harnad 1998). If bits of data click together and are consistent with a pre-existing understanding of the world (which already clicks together), the new knowledge is deemed legitimate. Science is therefore considered to be self-correcting, since the act of assembling data to produce new knowledge is itself the means through which claims are verified or refuted. This resembles the means of evaluating contributions in open source projects, which emphasizes code’s functionality as a primary factor. If code does not run, there must be a bug, or an inconsistency in the program, that renders it non-functional (Kelty 2008: 220). Contributions to a collective code base are therefore said to be based on merit, or the skillful implementation of code, rather than according to the qualities of the programmer who committed the code (Coleman 2012: 121). This parallels the idealized conception of scientific knowledge production described above, in that contributions to a collective enterprise are considered to be disembodied, unambiguous and lacking positionality ascribed by the people who contributed them.

These transformations are genuine attempts to reify the adage that “information wants to be free”, which implies that information is constrained by external domineering forces — namely, the arbitrary restrictions imposed by copyright law and the use of proprietary media formats and communications protocols — and that information can and should exist in a boundless state, which is assumed to be more natural (cf. Harnad 1998). However this bears an unsettling resemblance to libertarian ideology in that each involve a spurious assumption that their key agents of concern are naturally independent and asocial beings, and that these isolated and atomic units may vaguely combine to form products of greater value, i.e. communities or states and knowledge, respectively. Value is ascribed based on market-based solutions (“the marketplace of ideas”), which assume that all actors within the system behave rationally and in accordance with the system’s built-in assumptions (Wellen 2004: 110). In a world where digital communications platforms have come to resemble state institutions to a great extent (cf. Gorwa 2019; Nieborg and Poell 2018), OA promises to enact a populist and anti-establishment vision for the future of scholarly communications, as illustrated by Suber (2003) who remarked that the OA revolution is the start of a new era wherein “scientific communication can be in the hands of scientists, who answer to one another, rather than corporations, who answer to shareholders.”

All of this colloquial imagery frames science as largely responsive to nature, and ignores how science constructs data. OA visions of science rarely take into account the double hermeneutic that is characteristic of social science research methods, which considers researchers’ roles in ascribing meaning to the objects that they identify and collect to begin with. And yet, the myth of unidirectional data production and processing persists even in archaeology, despite the discipline’s strong tendency for critical reflection regarding its own practices (Sørensen 2017: 107). This is perhaps due to the practical and bureaucratic contexts within which archaeological research operates, namely the need to produce particular kinds of research outputs that require specific kinds of stable inputs. For instance, since the functional value of archaeological data is associated with their usefulness for generating the kinds of reports that archaeologists deem valuable, which are typically one-way forms of communication that contribute to an additive process of knowledge production (i.e. journal articles, book manuscripts, conference presentations, etc), anything else, including “unofficial” or non-authoritative perspectives and discourse, is commonly deemed extraneous for the purposes of formal analysis, and tend to be dismissed as a lower or less professional form of archaeological engagement, despite wide recognition of their value in theoretical discourse (Hodder 1989: 273-274; Joyce 2002: 138-139). Thus, the information infrastructures and sociopolitical pressures that frame the value regimes of archaeological research together contribute to a particular vision of what data are and how they should be acted upon.


Bevan, Andrew. 2015. ‘The Data Deluge’. Antiquity. a Quarterly Review of Archaeology 89 (348): 1473–84.


Bowker, Geoffrey C., Karen Baker, Florence Millerand, and David Ribes. 2010. ‘Toward Information Infrastructure Studies: Ways of Knowing in a Networked Environment’. In International Handbook of Internet Research, edited by Jeremy Hunsinger, Lisbeth Klastrup, and Matthew Allen, 97–117. Dordrecht: Springer Netherlands.


Bush, Vannevar. 1945. ‘As We May Think’.


Coleman, E. Gabriella. 2012. Coding Freedom: The Ethics and Aesthetics of Hacking. Princeton University Press.


Costa, Cristina. 2013. ‘The Habitus of Digital Scholars’. Research in Learning Technology 21 (1): 21274.


Dallas, Costis. 2015. ‘Curating Archaeological Knowledge in the Digital Continuum: From Practice to Infrastructure’. Open Archaeology 1 (1): 176–207.


Ettarh, Fobazi. 2018. ‘Vocational Awe and Librarianship: The Lies We Tell Ourselves – In the Library with the Lead Pipe’. In the Library with the Lead Pipe.


Foster, Erin D., and Ariel Deardorff. 2017. ‘Open Science Framework (OSF)’. Journal of the Medical Library Association : JMLA 105 (2): 203–6.


Gorwa, Robert. 2019. ‘What Is Platform Governance?’ Information, Communication & Society 22 (6): 854–71.


Harnad, Stevan. 1998. ‘Learned Inquiry and the Net: The Role of Peer Review, Peer Commentary and Copyright’. Learned Publishing 11 (4): 283–92.


Hodder, Ian. 1989. ‘Writing Archaeology: Site Reports in Context’. Antiquity. a Quarterly Review of Archaeology 63 (239): 268–74.


Huggett, Jeremy. 2020. ‘Is Big Digital Data Different? Towards a New Archaeological Paradigm’. Journal of Field Archaeology 45 (sup1): S8–17.


Huggett, Jeremy. 2022. ‘Data Legacies, Epistemic Anxieties, and Digital Imaginaries in Archaeology’. Digital 2 (2): 267–95.


Joyce, Rosemary. 2002. The Languages of Archaeology: Dialogue, Narrative, and Writing. Wiley.


Kelty, Christopher M. 2008. Two Bits: The Cultural Significance of Free Software. Duke University Press.


Kelty, Christopher M., Michael MJ Fischer, Alex “Rex” Golub, Jason Baird Jackson, Kimberly Christen, Michael F. Brown, and Tom Boellstorff. 2008. ‘Anthropology of/in Circulation: The Future of Open Access and Scholarly Societies’. Cultural Anthropology 23 (3): 559–88.


Kintigh, Keith W., Jeffrey H. Altschul, Ann P. Kinzig, W. Fredrick Limp, William K. Michener, Jeremy A. Sabloff, Edward J. Hackett, Timothy A. Kohler, Bertram Ludäscher, and Clifford A. Lynch. 2015. ‘Cultural Dynamics, Deep Time, and Data: Planning Cyberinfrastructure Investments for Archaeology’. Advances in Archaeological Practice 3 (1): 1–15.


Nieborg, David B, and Thomas Poell. 2018. ‘The Platformization of Cultural Production: Theorizing the Contingent Cultural Commodity’. New Media & Society 20 (11): 4275–92.


Sørensen, Tim Flohr. 2017. ‘The Two Cultures and a World Apart: Archaeology and Science at a New Crossroads’. Norwegian Archaeological Review 50 (2): 101–15.


Suber, Peter. 2003. ‘Open Access to Science and Scholarship’.


Tennant, Jonathan, Ritwik Agarwal, Ksenija Baždarić, David Brassard, Tom Crick, Daniel J. Dunleavy, Thomas Rhys Evans, et al. 2020. ‘A tale of two “opens”: Intersections between Free and Open Source Software and Open Scholarship’, March.


Wellen, Richard. 2004. ‘Taking on Commercial Scholarly Journals: Reflections on the “Open Access” Movement’. Journal of Academic Ethics 2 (1): 101–18.


Leave a Reply

Your email address will not be published. Required fields are marked *