Leveraging APIs and PIDs to automate externally-hosted research data into an Institutional Repository

Gerry Devine¹, George Cerexhe, Marijka Azzopardi, Emma McClean, Fiona Bradley

¹University Of New South Wales, Sydney, NSW, Australia

Abstract

A wide range of data repositories, both institutional and externally managed, exist for the archival and public dissemination of research data. This greatly benefits researchers by offering a choice of repository most suited to their dataset requirements and that enables maximum reuse of that dataset. This choice does however pose a problem to Institutions in keeping track of their research data assets. At UNSW a pilot project is being undertaken to examine automated metadata harvesting of external data repositories as a means of tracking the existence, location and evolving versions of data assets published externally. Using Dryad in the first instance, we utilise the APIs of both Dryad and UNSW’s Institutional Repository (UNSWorks) and show that datasets published externally can not only be effectively tracked, but also have their discoverability enhanced through mechanisms built into UNSWorks. Issues that arose and lessons learned in harvesting specific metadata fields as well as the importance of PIDs (including DOI, ORCID, and ROR) in accurately harvesting a dataset’s metadata and its related entities will be presented. The outcomes of this exploratory work will inform the metadata harvesting of further external repositories.

Biography

Gerry Devine is a Senior Research Data Librarian at the University of New South Wales. Gerry began his career as a researcher in the field of Atmospheric Science, before moving into Research Data Management. Gerry has held a variety of data management roles across a number of Universitys and currently promotes and aids researchers in publishing research data at UNSW.

https://orcid.org/my-orcid?orcid=0000-0002-0074-112X

Leveraging APIs and PIDs to automate externally-hosted research data into an Institutional Repository