Adrian Burton1, Carsten Friedrich2, Sebastien Mancini3, Bruce Simons4, Lesley Wyborn5
1Australian Research Data Commons, Canberra, Australia, firstname.lastname@example.org
2CSIRO Data61, Canberra, Australia, Carsten.Friedrich@data61.csiro.au
3IMOS, Hobart, Australia, email@example.com
4CeRDI Federation University, Ballarat, Australia, firstname.lastname@example.org
5ANU, Canberra, Australia, email@example.com
Data services have become an integral part of the research, government and industry sectors. They provide automated functions for the creation, access, processing and analysis of data. The development of data-focused services is steadily increasing in Australia, for example in the NCRIS capabilities (e.g., AuScope, IMOS, TERN, AURIN, ALA, NCI), CSIRO and government agencies (e.g., GA, Department of Environment, and ABS); all are moving to more formal publishing of data through services.
Properly deployed, standards conformant web services should enable cross domain discovery and in-situ programmatic access to process from multiple distributed sources. However, there are three fundamental issues that are currently impeding a more efficient use of data services in Australia:
- Findability and accessibility – a lack of consistency in service descriptions that makes it hard to discover data services and action them;
- Interoperability and reusability – a lack of, or variable implementation of, standard protocols and information models that make it hard to aggregate identical data types from multiple sources; and
- Agreement on which data services standard to implement for a particular dataset.
This results in at least 4 approaches:
- Data from distributed resources are centralised (cached) in a single locality, harmonised and then made accessible via services from that central location;
- Data providers and/or facilities being requested to support an unsustainable number of protocols and standards;
- Data providers being asked to provide a custom modification to an individual specific service so that the data set can be accessed by a specific community; and
- Data services are idealistically provided by multiple sources and conform to a widely used, internationally agreed standards and can be sustainably accessed for many and varying use cases.
To make data services FAIRer and improve interoperability across multiple domains, for multiple use cases, ARDC has been organising two parallel activities:
- We formed a focus group with members from the NCRIS capabilities (ALA, AuScope, IMOS, TERN, NCI and a nascent Agriculture capability) and government agencies (e.g., CSIRO, GA, BoM), that are working specifically on the ARDC funded Data Enhanced Virtual Laboratories (DeVL) and Research Data Cloud (RDC) Projects in GeoScience, Marine Science, EcoScience, Climate and Agriculture. The group discusses standardisation of data services description and APIs across these projects, with primary focus on data services that are compliant with a collection of OGC standards, OPeNDAP protocols, THREDDS data servers and GeoNetwork catalogues.
Based on community agreed service descriptions and an API, the ARDC Services team is developing a national service registration and discovery layer for both service providers and service consumers (Figure 1). The discovery layer will address the findability issue and should provide a one stop for data consumers to search for and access data services offered across NCRIS facilities, universities, science agencies and government data providers that are participating in these particular DeVL and RDC projects. The interoperability and accessibility issues will be addressed by the community of data providers and consumers converging on common practice.
- We started a wider Australian Data Services Interest Group by facilitating discussion, exchanging information and experience of data services development across a broader range of Australian communities, including those involved in developing international standards for data service description and access. This interest group meets every three months and intends to take the lessons learnt from the more specific Focus Group and expand it to a wider community.
The two Interest Groups are in partnership with the Earth Systems Information Partners (ESIP) of the US, in particular the ESIP Information Interoperability and Technology Committee and the ESIP Data Stewardship Committee. ESIP is supported by NASA, NOAA, USGS and 110+ member organizations.
Figure 1. Discovering Data Services through a Services Registry
We propose a 60-minute BoF session. The session provides a venue for a face-to-face meeting of the interest group, and also enables us to involve people from the wider community. The BoF will include an introduction to data services, associated standards and effort we have made so far to make data services FAIRer. We also invite people from Agriculture, Geoscience and Marine Science to introduce their implementation of data services.
Adrian Burton is Director of Services at the Australian National Data Service. Adrian has provided strategic input into several national infrastructure initiatives, is active in building national policy frameworks to unlock the value in the research data outputs of publicly funded research.
Carsten Friedrich is a Research Team Leader at CSIRO Data61. At CSIRO he worked in a variety of areas including Cloud Computing, Cyber Security, Virtual Laboratories, and Scientific Software Registries.
Bruce Simons has 24 years geophysical surveying and interpretation, 17 years UML data modelling, and XML/GML schema development to implement interoperable network services using XML markup and OGC web services to enable schematic and semantic interoperabilty.
Lesley Wyborn currently has a joint adjunct fellowship with NCI. She is Chair of the Australian Academy of Science ‘Data for Science Committee’ and on the AGU Data Management Advisory Board and the Steering Committee of the AGU-led FAIR Data Publishing Project.