Nigel Rees1, Ben Evans2, Graham Heinson3, Jingbo Wang4, Lesley Wyborn
5, Kelsey Druken6, Dennis Conway7
1The Australian National University (NCI), Canberra, Australia, firstname.lastname@example.org
2The Australian National University (NCI), Canberra, Australia, email@example.com
3The University of Adelaide, Adelaide, Australia, firstname.lastname@example.org
4The Australian National University (NCI), Canberra, Australia, email@example.com
5The Australian National University (NCI), Canberra, Australia, firstname.lastname@example.org
6The Australian National University (NCI), Canberra, Australia, email@example.com
7The University of Adelaide, Adelaide, Australia, firstname.lastname@example.org
Magnetotelluric (MT) data in the research community is traditionally stored on departmental infrastructures and when published, the data is in the format of processed esoteric downloadable files with limited metadata. In order to obtain the source raw MT time-series data, a lengthy process ensues where one would typically have to email the data owner and transfer would be either via FTP download for local processing, or in some cases, the files sizes are so large that they need to be transferred on hard disk via Australia Post.
It has become increasingly apparent to the MT community that in order to increase online collaboration, reduce time for analysis, and enable reproducibility and integrity of scientific discoveries both inside and beyond the MT community, datasets need to evolve to adopt Findable, Accessible, Interoperable and Reusable (FAIR) data principles. The National Computational Infrastructure (NCI) has been working with The University of Adelaide to address these challenges as part of the 2017-2018 AuScope-ANDS-NeCTAR-RDS funded Geoscience Data Enhanced Virtual Laboratory (DeVL) project. The project aims to make the entire University of Adelaide MT data collection (from 1993-2018) FAIR. NCI have also added an assortment of MT processing and modelling software on both their Virtual Desktop Infrastructure and Raijin Supercomputer, which has helped to reduce data processing and subsequent modelling times.
The University of Adelaide MT data collection needs to be both discoverable and accessible online, and conform to agreed international community standards to ensure interoperability with other international MT collections (e.g., AusLAMP , EarthScope USArray , SinoProbe ), as well as reusability for purposes other than what the data was collected for. For the process to become more transparent, the MT community will need to address fundamental issues including publishing FAIR datasets, publishing model outputs and processing regimes, re-evaluating vocabularies, semantics and data structures, and updating software to take advantage of these improvements. For example, it is no longer sufficient to only expose the processed data; the raw instrument data needs to be preserved persistently so that as algorithms improve, the original source data can be reprocessed and enhanced. Consistent with the FAIR and reproducibility principles, the MT processing and modelling tools should also be easily discoverable and accessible and where required usable in online virtual environments, with software versions citable. The journey from the raw-data to the final published models should be transparent and well documented in provenance files, so that published scientific discoveries can be easily reproduced by an independent party.
One of the components of this project has been to explore the value of converting raw MT time-series into open scientific self-describing data formats (e.g., Network Common Data Form (netCDF)), with a view to showing the potential for accessibility through data services. Such formats open up the ability to analyse the data using a much wider range of scientific software from other domains. As an example, Jupyter Notebooks have been created to show how the MT data can be accessed and processed via OPeNDAP data services. These changes alone will aid in the usability of the data, which can be accessed without having to explicitly pre-download the data before commencing any analysis.
The Geoscience DeVL project has focused on making the University of Adelaide MT data available online as well as assembling software and workflows available in a supercomputer environment that significantly improve the processing of data. This project has also made a valuable addition to the AuScope Virtual Research Environment, which is progressively making more major Earth science data collections, software tools and processing environments accessible to the Australian Research community. The results of our work are also being presented at international MT forums such as the 24th EM Induction Workshop  held in Helsingør, Denmark, to ensure that the data capture, publishing, curation and processing being undertaken at NCI is in line with best practice internationally.
This work was supported by the National Computational Infrastructure, AuScope Limited, ANDS-NeCTAR-RDS and The University of Adelaide.
- The Australian Lithospheric Architecture Magnetotelluric Project (AusLAMP). Available from: http://www.ga.gov.au/about/projects/resources/auslamp , accessed 21 June 2018.
- The EarthScope USArray magnetotelluric program. Available from: http://www.usarray.org/researchers/obs/magnetotelluric , accessed 22 June 2018.
- SinoProbe – Deep Exploration in China. Available from: http://sinoprobe.cags.ac.cn/About-Sinoprobe/ , accessed 22 June 2018.
- The 24th EM Induction Workshop (EMIW2018). Available from: https://emiw2018.emiw.org/ , accessed 21 June 2018.
Nigel Rees is a Research Data Management Specialist at the National Computational Infrastructure (NCI) with a background in magnetotelluric geophysics. In his role at NCI, he supports research data needs and assists with the management, publishing and discovery of data.