Dr Jens Klump1, Dr Lesley Wyborn2, Ms Irina Bastrakova3, Dr Anusuriya Devaraju1, Prof Dr Brent McInnes4, Dr Simon Cox5, Mr Ryan Fraser1
2NCI/ANU, Canberra, Australia, firstname.lastname@example.org
3Geoscience Australia, Canberra, Australia, email@example.com
4Curtin University, Bentley, Australia, firstname.lastname@example.org
5CSIRO Land & Water, Clayton, Australia simon.cox.csiro.au
The collection of physical samples is the foundation of many research endeavours and is undertaken by many different entities (e.g., individual researchers, laboratories, government agencies, mining companies, citizens, museums, etc.) for multiple purposes. However over time, a particular site can be resampled many times over either because the collector did not know that this area had previously been sampled, or because samples collected had not been properly curated and were no longer available. Furthermore, resampling may not be an option, due to cost, accessibility constraints, or timeliness. Researchers are increasingly realising that repositories of well-curated samples can be a treasure chest not only for the actual samples themselves but also for combination with prior observations that have already been made by a variety of instruments on the same sample.
In order to be able to reuse physical samples, they must be systematically curated over the long-term. Systematic sample curation, cataloguing, and persistent globally unique identification ensure both that their existence is known, and allows data derived from them through laboratory and field tests to be linked to these samples . This has already been demonstrated in databases for geochemistry and for hyperspectral remote sensing. In the example of hyperspectral remote sensing, links can be established between remote sensing data products and the samples that were used as ground truth for their calibration.
IGSN in Implementation
In an Australian collaboration, we used the IGSN (International Geo Sample Number, http://igsn.github.io) to identify samples in a globally unique and persistent manner. IGSN is interoperable with other persistent identifier systems such as DataCite and the basic IGSN description metadata schema is designed to be aligned with existing schemas, such as OGC Observations and Measurements (O&M) and DataCite, which makes crosswalks to other metadata schemas easy [2,3]. IGSN metadata are disseminated through the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) allowing it to be aggregated in other applications such as portals (e.g. the Australian IGSN catalogue http://igsn.org.au) (Figure1). Through this protocol sample metadata can be made available in more than one schema.
The software for IGSN web services is based on components developed for DataCite and adapted to the specific requirements of IGSN. This cooperation in open source software development ensures sustainable implementation and faster turnaround times for updates.
IGSN, in particular in its Australian implementation, is characterised by a federated approach to system architecture and organisational governance giving it the necessary flexibility to adapt to particular local practices within multiple domains, whilst maintaining an overarching international standard.
IGSN in Australia
There are currently three IGSN allocating agents in Australia: Geoscience Australia, CSIRO and Curtin University, representing three different sectors respectively – government agencies, government research agencies, and academia . In Australia, IGSN has also benefited from funding and support from the National Collaborative Research Infrastructure Strategy (NCRIS). For example, the Australian Research Data Services (RDS) Program have provided funding to help develop registration services and a common web portal that allows discovery of physical samples and sample collections at a national level (http://igsn.org.au) (Figure 1). The Australian National Data Services (ANDS) also plays an important role in this collaboration as a promoter for IGSN, facilitator for outreach to other domains that require sample identification (e.g., soils, mineral spectra, digital core specimens, insects), and as a host for the IGSN-related vocabulary service.
Figure 1: Screen shot of the Australian IGSN Portal Demonstrator
As a result, the IGSN network enables a common access to catalogues of unambiguously identified samples from different agents, which ultimately, promotes collaboration across all Earth Science disciplines. It also increases the cost effectiveness of research by reducing the need to re-collect samples in the field and at the same time can help to increase the rigour of interdisciplinary science in that, provided there is still material left, the same sample can be analysed by multiple techniques and research groups, often over decades. Further, by extending the RDS Data Life Cycle Framework (https://www.dlc.edu.au/about) to include IGSN sample identifiers in Australia, funding agencies could even use the portal to review grant proposals for expensive collection program proposals and ascertain just how many samples are already available in curated repositories from a particular area, what data have been derived and published on them, and also determine what data collection programs have already been funded in similar areas!
IGSN INTERNATIONAL GOVERNANCE
IGSN is governed by an international organisation, the IGSN Implementation Organization e.V. (http://www.igsn.org). Membership in this organisation links the Australian IGSN community to the wider international community and at the same time allows it to act locally to ensure that the services offered are relevant to the needs of Australian researchers. This flexibility aids the integration of new disciplines into a global community of a physical samples information network.
1. McNutt, M., K. A. Lehnert, B. Hanson, B. A. Nosek, A. M. Ellison, and J. L. King (2016), Liberating field science samples and data, Science, 351(6277), 1024–1026, doi:10.1126/science.aad7048.
2. Horsburgh, J. S. et al. (2016), Observations Data Model 2: A community information model for spatially discrete Earth observations, Environmental Modelling & Software, 79, 55–74, doi:10.1016/j.envsoft.2016.01.010.
3. Devaraju, A., J. F. Klump, S. J. D. Cox, and P. Golodoniuc (2016), Representing and Publishing Physical Sample Descriptions, Comp. Geosci., 96, 1–10, doi:10.1016/j.cageo.2016.07.018.
4. Wyborn, L. A. et al. (2017), Building an Internet of Samples: The Australian Contribution, in Geophysical Research Abstracts, vol. 19, pp. EGU2017-11497, Copernicus Society, Vienna, Austria.
Jens Klump is the CSIRO Science Leader for Earth Science Informatics. As a member of CSIRO Mineral Resources, he is based in Perth, Western Australia. Jens’ field of research is the application of information technology to earth science questions. His research topics include data driven science and machine learning, virtual research environments, remotely operated instruments, programmatic access to data, high performance and cloud computing, and the development of system solutions for large geoscience projects.
Jens has degrees in geology and in oceanography from the University of Cape Town (UCT) and received his PhD in marine geology from the University of Bremen, Germany. He was part of the team that developed the foundations for what later became DataCite and later applied the principles developed here to building the International Geo Sample Number (IGSN), a persistent identifier system for physical specimens. Jens has more than sixteen years of experience in designing and building research data infrastructures and has served on several committees working on related topics. Jens is the vice president of the IGSN Implementation Organisation and vice president of the EGU Earth and Space Sciences Division.