Open Science in Data-intensive Research Requires Multiple Entry Points: A Case Study from AuScope in Solid Earth Science Infrastructures.

Dr Lesley Wyborn1,2, Dr Rebecca Farrington2, Alex Hunt3, Dr Jens Klump3, Anusuryia Deveraju3, Dr Tim Rawling2, Dr Bryant Ware4, Dr Angus Nixon5

1NCI, Australian National University, Canberra, Australia, 2AuScope Ltd, Melbourne, Australia, 3Mineral Resources, CSIRO, Kensington, Australia, 4Curtin University, Bentley, Australia, 5The University of Adelaide, Adelaide, Western Australia

Biography:

Lesley Wyborn is an Honorary Professor at ANU at the National Computational Infrastructure (NCI) and at the Research School of Earth Sciences. She also works part time for ARDC. She had 42 years’ experience in Geoscience Australia (GA) in both research and in data science/data management. Since leaving GA in 2014 has continued research in aspects of Data Science including data quality, versioning, reproducibility, operationalising the FAIR, CARE and TRUST principles and Open Science and is a player in many global informatics initiatives. Her current focus is the development of transparent high-performance aggregated national-scale datasets that are compatible with international data networks. ORCID: 0000-0001-5976-4943

Abstract:

Solid Earth Science datasets provide evidence-based insights into surface and subsurface environments, including the quantification of longitudinal changes over decades. However, their increasing diversity and scale present significant challenges. Primary Observational Datasets (PODs) range widely in size, from small-scale collections in the megabyte range, suitable for on-premise or cloud storage, to high-volume collections that are petabytes in volume and require co-located High Performance Compute-Data (HPC-D) platforms for timely, effective analysis.

Many funders now request compliance with the FAIR, CARE and TRUST principles, whilst increasing demands for Open Science set a high bar for reproducibility, transparency and sharing requiring open publication of all data collected, tools and processes (UNESCO, 2021; https://doi.org/10.5281/zenodo.5741832).

It is no longer possible for a single repository to meet these requirements and serve all users, who range from expert power-users to novices. Instead, a ‘Repository Ecosystem’ is needed, one that balances resources along the full-path of research data use, including:

1. Curation and sustainable preservation of raw full-resolution PODs captured directly off instruments;

2. Calibration and conversion of raw PODs into full-resolution reference datasets using community-agreed machine-readable data formats, standards and vocabularies and annotation with rich machine-actionable metadata;

3. Systematic reprocessing of PODs into reusable downstream analysis-ready products that meet specific researcher needs.

This paper outlines Auscope’s approach to developing a Solid Earth Science Data Ecosystem that enables seamless access to PODs, hosted on HPC-D platforms and cloud environments, and clear pathways that connect these datasets to processed, analysis-ready data products delivered through distributed data platforms and portals.

 

 

Categories