On-demand Research Notebooks with all the Trimmings

On-demand Research Notebooks with all the Trimmings

Alex Ip1, Peter Sefton2, Moises Sacal Bonequi2, River Tae Smith3, Steele Cooke1

1AARNet Pty Ltd, Sydney, NSW, Australia
2University of Queensland, Brisbane, Queensland, Australia
3Monash University, Melbourne, Victoria, Australia

Abstract

This presentation covers the Australian Text Analytics Platform (ATAP) BinderHub deployment within the Language Data Commons of Australia (LDaCA). BinderHub allows research code in Jupyter Notebooks to be launched in custom containers including their dependencies, thereby lowering the barrier to entry for analytics and improving the reproducibility of computational research.

Some innovative aspects of the ATAP service are that it is linked to a data repository archive service (Oni) via an API, the resource access control (REMS) respects CARE/FAIR principles, the code and data are both described with rich RO-Crate metadata, and the code metadata specifies compute requirements.

The data repositories are served using Oni: a server-side API presenting objects stored in the standard Oxford Common File Layout (OCFL). Every object provides both RO-Crate metadata (on files, collections, people, places, etc.) and licence-controlled direct access to files.

Authentication is performed using CILogon, and authorisation is managed using the Resource Entitlement Management System (REMS), based on content licences specified by Data Stewards. Using REMS, licences may be granted either implicitly (click-through) or explicitly using predefined approval workflows.

Both code and data are described using the RO-Crate metadata standard, which further improves research integrity and facilitates archiving.

We employ an RO-crate profile for code which includes information about the compute requirements (e.g. CPU, GPU and memory), so we can dynamically determine appropriate compute environments for code resources.

While ATAP is being developed in the context of text analytics, the infrastructure is generic and applicable to all disciplines.

Biography

Alex Ip has worked across diverse sectors including manufacturing, software development, data engineering and eResearch infrastructure development over several decades. He has worked closely with researchers in domains including livestock genetics, Earth observation, geophysics, and, most recently, bio-informatics, digital asset preservation and text analytics.

Alex developed the operational prototype system which became Digital Earth Australia, and also developed the back-end information architecture for Geoscience Australia’s current Geophysical Archive Data Delivery System v2 (GADDS2).

Within AARNet, Alex’s team provides innovative eResearch infrastructure solutions to projects including the Australian BioCommons, the Australian Text Analytics Platform (ATAP), and the Play-It-Again digital preservation project.

Categories