Michael Lynch1 Peter Sefton2 Sharyn Wise3
1University of Technology Sydney, Michael.Lynch@uts.edu.au
2University of Technology Sydney, Peter.Sefton@uts.edu.au
3University of Technology Sydney, Sharyn.Wise@uts.edu.au
Research data management is critical for the integrity of scholarship, the ability of researchers and institutions to re-use and share data, and for IT support staff and data librarians to be able to plan, maintain and curate data collections. It’s also time-consuming and daunting for researchers and runs the risks of becoming yet another bureaucratic hurdle to research work.
Provisioner is an open framework for integrating research data management into research tools and workflows, allowing researchers to select applications from a service catalogue and create workspaces which are linked to research data management plans, and supporting data archiving and publication.
This presentation will cover:
- The ideas underlying the Provisioner framework, which is a loosely-coupled distributed system designed for high-resilience against inevitable organisational and technical change
- ReDBox 2.0, the platform in which this work is implemented
- Using DataCrates for integrated metadata and a file-based repository
- A case study of a research data workflow from microscopes to data modelling and simulation to an immersive visualisation
The Provisioner is based on two key ideas. The idea of a “workspace” is used to design a limited set of basic operations – create, share, import and export – which can be executed via APIs on a wide range of research data applications and storage services. The idea of redundant, machine- and human-readable metadata stored with the data is used to build a system which isn’t a monolith where identifying the creator, owner and funding agency of a dataset would depend on a centralised database.
The Provisioner framework allows us to connect diverse research applications using DataCrates as a common interchange format and manage automated pipelines of data management tasks such as exporting data, crosswalking metadata and requesting DOIs from minting services.
The principle way in which researchers interact with the Provisioner is through the service catalogue in the University’s research data management tool and data catalogue, Stash, implemented in ReDBox.
As part of the Provisioner project, ReDBox has been redeveloped as a modern web application, and now includes a service catalogue from which researchers can select workspaces in a range of research applications, from OMERO (an open microscopy environment), GitLab (for maintaining and publishing software as a research output) to research fileshares.
Provisioner uses the DataCrate standard to store metadata in human- and machine-readable formats with the accompanying datasets at different stages of the research life-cycle, from creation and analysis through to archiving and publication. DataCrates are directories on a filesystem with a conventional layout based on the BagIt standard and linking to contextual metadata with JSON-LD, and are suitable for both archiving and publication.
We present a case study of a research data workflow which starts with video microscopy of bacteria, through to a code repository and mathematical modelling of bacterial movement, to 3D visualization of simulated bacterial movement in the UTS Data Arena.
- Lynch, M. “Introducing Provisioner”, https://eresearch.uts.edu.au/2018/04/05/provisioner_1.htm, 2018, accessed 21 June 2018.
- Sefton, P. “DataCrate: Formalising ways of packaging research data for re-use and dissemination”, Presentation, eResearch Australasia 2017, https://conference.eresearch.edu.au/2017/08/datacrate-formalising-ways-of-packaging-research-data-for-re-use-and-dissemination/, accessed 22 June 2018.
Mike Lynch is an eResearch Analyst in the eResearch Support Group at UTS. His work involves solution design, information architecture and software development supporting research data management. His other interests include data visualisation and functional programming languages.