Cultural Heritage & Library Collections as Data and their Role in Digital Humanities Infrastructure

Dr Toby Burrows1, Prof. Deb Verhoeven2, Dr Christopher  McAvaney3

1University of Oxford, Oxford, UK,

2Deakin University, Burwood, Australia,

3Deakin University, Burwood, Australia


The importance of cultural heritage collections for research in the humanities, arts and social sciences has long been recognized. The digital and digitized forms of these collections are equally crucial for research which uses the methodologies, technologies and critical perspectives of the digital humanities. For the National Research Infrastructure Roadmap Report (2017), HASS platforms encompass both the physical collections and “online portals that facilitate the digitisation of and digital access to original artefacts, materials and knowledge.” [1] The Report emphasizes discoverability and accessibility as priorities, together with “enhanced digitisation aggregation and interpretation platform processes”.

At the same time, an initiative to understand these collections as data is gathering pace in the United States. Under the auspices of the Library of Congress and the Institute of Museum and Library Services, this “Collections as Data” program “aims to foster a strategic approach to developing, describing, providing access to, and encouraging reuse of collections that support computationally-driven research.” [2] One of the drivers for this initiative is the perception that, as Miriam Posner argues, “Libraries and archives [and museums] are increasingly making their materials available online, but, as a general rule, these materials aren’t of much use for computational purposes.” [3]

Instead, as Thomas Padilla has summarized the project, a “collections as data” imperative can benefit research communities by shifting cultural heritage and library practices in three key frames:
1. Generativity: to increase meaning making capacity
2. Legibility: to document and convey provenance and possibility
3. Creativity: to empower experimentation [4]

While much of the work of the “collections as data” program is focused on ways of making collections data available and accountable, there is also an interest in the relationship between data and infrastructure. The Santa Barbara Statement on Collections as Data summarises: “Working toward interoperability entails alignment with emerging and/or established community standards and infrastructure.” [5]

This presentation will evaluate three different approaches to delivering and using collections data to build HASS-oriented platforms. The British Museum’s ResearchSpace is a platform for bringing together data from cultural heritage collections by mapping to the CIDOC-CRM ontology. [6] Initially limited to the British Museum’s own collections, it is now been tested by other institutions in Europe and North America, as well as by the Collecting the West project in Western Australia. This project is using ResearchSpace to bring together data relating to Western Australian objects held in collections in Australia and Europe. [7] ResearchSpace enables researchers to work with collections data by adding annotations and arguments to objects and other entities.
OXLOD (Oxford Linked Open Data) is taking a similar approach, mapping data from Oxford University’s museums and libraries to CIDOC-CRM, in order to produce an interdisciplinary platform for cultural heritage research. An estimated 200,000 digital records will be linked and mapped in the initial phase of this project. [8]

The third service is HuNI, the Australian virtual laboratory for the humanities, which ingests collections data from library catalogues like AIATSIS, AFI and ACMI as well as data from various archives. It also aggregates data from Trove’s digitized newspaper collection, and from reference works, bibliographies and event-oriented databases. HuNI has recently added a pipeline from ingesting data for different collection types including researcher-contributed collections via the Omeka software.

HuNI re-formats collections data by extracting entities from incoming records and making them available for linking and visualizing, in the form of a network graph. Interpretations can be added to the data in the form of relationships and links, and the entities can be re-constituted into a researcher’s own collections. [9] Queries can be performed via keyword or through the graph search. The relationships between records are themselves distinguished between HuNI System generated links and user created links.

What these three services have in common is the idea of taking collections data and using them to create network graphs of relationships between entities – including people, places and objects. While there are other things which can be done with collections data (such as image interoperability using IIIF, and textual analysis of digital texts), network graphs are a powerful way of uncovering the meaning and significance of the knowledge embedded in cultural heritage collections. Analysis of these services will form the basis for a set of recommendations for best practice in making collections data available for computational purposes.


1. Australian Government, 2016 National Research Infrastructure Roadmap (Canberra, 2017), p. 33

2. Always Already Computational: Library Collections as Data (2017)

3. Posner, Miriam, “Actually Useful Collection Data: Some Infrastructure Suggestions”, in: Always Already Computational: Library Collections as Data: National Forum Position Statements (2017)

4. Padilla, Thomas , “On a Collections as Data Imperative”,

5. The Santa Barbara Statement on Collections as Data (2017)




9. Burrows, Toby and Deb Verhoeven, “Aggregating Cultural Heritage Data for Research Use: The Humanities Networked Infrastructure (HuNI)”, in Metadata and Semantics Research, 9th Research Conference, MTSR 2015, Manchester, UK, September 9–11, 2015: Proceedings, ed. Emmanouel Garoufallou, Richard J. Hartley, Panorea Gaitanou (Communications in Computer and Information Science, 544) (Cham: Springer, 2015), pp. 417-423



Deb Verhoeven is Professor and Chair of Media and Communcaition at Deakin Univeristy. She is Director of the Humanities Network Infrastructure (HuNI).

Recent Comments