Preparing geophysical data and software for Exascale: A case study from the 2030 Geophysics Collections Project

Preparing geophysical data and software for Exascale: A case study from the 2030 Geophysics Collections Project

Nigel Rees1, Rui Yang1, Yue Sun1, Edison Guo1, Lesley Wyborn2, Ben Evans1

1NCI Australia, Canberra, ACT, Australia
2Australian National University, Canberra, ACT, Australia

Abstract

The 2030 Geophysics Collections Project, a collaboration between NCI, AuScope, ARDC and TERN, seeks to make accessible a selection of rawer, high-resolution versions of existing datasets and ensure they are suitable for programmatic access in next generation technologies and computational infrastructures, which we know will be at exascale and will require data to be fully compliant with the FAIR principles, self-describing and machine readable.

Terabytes of existing magnetotelluric (MT) and passive seismic time series data can now be accessed using the Gadi supercomputer at NCI. For MT, a dedicated effort went into building automations to convert existing MT instrument time series data into the new international MTH5/mt_metadata standard.

In parallel, a well managed and FAIR software environment was established at NCI that integrates a plethora of different complex and intricate open source codes and libraries in multiple programming languages. It includes geophysics and AI/ML software modules that consolidate Python, Julia and R environments together with thousands of pre-built libraries. The modules are community driven, regularly updated and ease the burden of researchers having to spend significant time on software engineering principles. These software environments coupled together with high-performant standardised geophysical data formats enable researchers to rapidly analyse very large volumes of data and see the quality of their algorithms or workflows within realistic time frames.

The emphasis with the NCI Geophysics Specialised Environment is on creating an ability for researchers to efficiently develop their own HPC multi-physics workflows tailored to their specific use-cases and thus enable greater research innovation.

Biography

Nigel Rees completed a BSc (Hons) at The University of Adelaide in petroleum engineering, geology and geophysics in 2010 and worked in the petroleum industry for three years. Since completing his PhD in magnetotelluric geophysics (2016) at The University of Adelaide he has worked at the National Computational Infrastructure (NCI) initially as a Research Data Management Specialist and now as an HPC and Data Software specialist focusing on enabling HPC techniques on large volume geophysical data sets.

Categories