Natasha Simons1, Julia Martin2, Mingfang Wu3, Adrian Burton4, Jens Klump5, Keith Russell6, Gerry Ryder7, Lesley Wyborn8, Tim Rawling9
1Australian Research Data Commons, Brisbane, Australia, firstname.lastname@example.org
2Australian Research Data Commons, Canberra, Australia, Julia.Martin@ardc.edu.au
3Australian Research Data Commons, Melbourne, Australia, Mingfang.Wu@ardc.edu.au
4Australian Research Data Commons, Brisbane, Australia, Adrian.Burton@ardc.edu.au
5CSIRO Mineral Resources, Perth, Australia, email@example.com
6Australian Research Data Commons, Melbourne, Australia, Keith.Russell@ardc.edu.au
7Australian Research Data Commons, Adelaide, Australia, firstname.lastname@example.org
8 National Computational Infrastructure, ANU, Canberra, Australia, Lesley.Wyborn@anu.edu.au
9AuScope, Melbourne, Tim.Rawling@unimelb.edu.au
In modern research, much of geoscience and equivalent investigations in the environmental sciences are based on observations and measurements of real-world phenomena which can range from simple visual observations on small hand sized physical samples to voluminous ex-situ measurements made using satellite or laboratory/sensor instruments. Information on samples, digital data and computational methods is rarely captured in traditional publications. Fifty years ago, most data that underpinned a scholarly publication could be represented in typeset tables, but with the advent of the digital age and the computerisation of instruments, the volumes of data collected became too large to present as tables within a paper. Data then at best became included as a supplement to the paper accessible by contacting the journal, or else could be obtained ‘by contacting the author’. Such approaches limit the ability to test the veracity and reproducibility of a publication and do not guarantee accessibility and persistence of input research artefacts into the future, nor do they ensure the capability of them being reused for purposes beyond the original use case. The Geoscience Paper of the Future was recently proposed to enable researchers to fully document, share, and cite all their research products including physical samples, data, software, and computational provenance  and at about the same time, the Findable, Accessible, Interoperable and Reusable (FAIR) Principles  emerged. Today, publishers do not have a consistent way of citing data underpinning a publication whilst details on how to reference/access physical specimens or software are rarely provided. Also, the interpretations of the FAIR principles can be quite inconsistent.
To address this complex issue, in 2017, a grant from the American Laura and John Arnold Foundation was awarded to the American Geophysical Union (AGU) and other partners (including AuScope, National Computational Infrastructure, the Australian Research Data Commons) to significantly improve the interconnection of data, samples, software and literature in the Earth and space sciences, based around the FAIR principles. The key objectives of the project are that:
- Publishers will follow consistent policies for sharing and citing data, samples and software used in the scholarly literature and will move from having these as supplements to the publication to using trusted repositories for publishing supporting research artefacts;
- Open repositories for Earth and environmental sciences will enable those policies and other data applications by providing persistent identifiers, rich metadata, and related services for the data, software and samples they hold;
- Geoscience researchers will know how to consistently share, document, and reference data, samples and software and use globally persistent identifiers to uniquely identify their research outputs.
These objectives finally provide a response to the inevitable change required in scholarly communication driven by the emergence of computers and the dawning of the age of digital data collection and curation fifty years ago, followed by the need for more complex software to process ever-increasing data volumes. However effective implementation will require a significant cultural change in today’s research practices, many of which come from the pre-digital era. A critical component of the AGU-led project is promoting the value of citation with identifiers to researchers so that they know how to effectively use them in publications and ensure credit is acknowledged when credit is due.
PROMOTING THE VALUE OF IDENTIFIERS TO RESEARCHERS
Although identifiers have been commonplace for scholarly publications for some time and most Australian researchers have an ORCiD, few realise the power of using equivalent identifier systems for all their research artefacts including physical samples, software and data.
1. Advantages of using Samples Identifiers
The International Geo Sample Number (IGSN), used on 5 continents to uniquely identify physical samples, allows researchers to firstly gain credit for sample collection and preparation, and secondly enable them to trace where other analytical work is published on samples that they collected and curated. As the usage of IGSN grows it will also be possible to locate other samples from the same geographical features (e.g. a borehole or a remote island) to obtain a more complete overview of where new data generated by a researcher relates to existing data in the literature. Likewise, funders can trace where a sampling project they funded has resulted in high impact publications.
2. Advantages of Using Software Identifiers
Proper use of identifiers and citation for software means that a researcher can trace where their software has been used by others in publications and acknowledged for this work. Further, by being able to search registers of appropriately described and cited software, researchers can also reduce the ‘Time to Science’ as they do not waste time rewriting complex code that already exists.
3. Advantages of Using Identifiers for Datasets that Underpin Publications
Increasingly the use of unique identifiers for data and proper citation of that data is being used for career advancement. For example, through linking of identifiers, a researcher is able to track usage of any of their datasets used in a high impact paper by other researchers and gain credit. In addition, a persistent identifier such as a DOI ensures long-term access to the dataset for enabling reproducibility of the current research and reuse for new research directions.
CURRENT ARDC INFRASTRUCTURES TO PERSISTENTLY IDENTIFY RESEARCH ARTEFACTS
Once researchers embrace the need for identifiers as part of their research ecosystem, they must have access to infrastructures that enable the persistent and unique identification of, and access to their research artefacts throughout their career and beyond. Over the last 10 years, the Australian Research Data Commons (ARDC) and its predecessors have been building an infrastructure for data citation which assists researchers to enable FAIR publication of data and ensure proper recognition and citation of their data in their own and any subsequent publications that also use their data. Details are available on https://www.ands.org.au/working-with-data/citation-and-identifiers/data-citation.
In the recent ARDC/AuScope/NCI funded Geosciences Data-enhanced Virtual Laboratory project, the ARDC has been working with the Geoscience community to develop equivalent persistent identifier systems for samples and software. Australian geoscience researchers can obtain access to IGSNs for their physical samples (specimens) here: http://www.auscope.org.au/igsn-info/ and information about citation for physical samples is here: http://www.ands.org.au/working-with-data/citation-and-identifiers/igsn. An ARDC guide for software citation is available here: https://www.ands.org.au/working-with-data/citation-and-identifiers/software-citation.
Combined, these efforts will ensure that Australian Geoscience researchers can meet the new demands that are now emerging from the Earth and space science publishers and enable moving towards the Geoscience Paper of the Future. The ARDC identifier systems recently developed for physical samples and software are easily portable to other physical sciences such as the environmental, marine and bio domains and will help ensure that research artefacts will be Findable, Accessible for current and future generations of researchers and Reusable for purposes beyond which they were collected for. It is accepted that Interoperability will still take some time, but plans are already being developed.
- Gil, Y., David, C.H., Demir, I., Essawy, B.T., Fulweiler, R.W, Goodall, J.L., Karlstrom, L., Lee, H., Mills, H.J., Oh, J.H., Pierce, S.A., Pope, A., Tzeng, M.W., Villamizar, S.R., and Yu, X., 2016. Toward the Geoscience Paper of the Future: Best Practices for Documenting and Sharing Research from Data to Software to Provenance. Earth and Space Science, 3, 388-415. https://doi.org/10.1002/2015EA000136 Accessed 18 August 2018.
- Wilkinson, M.D., Dumontier, M., Aalbersberg, IJ.J., Appleton, G., Axton, M., Baak, A., Blomberg, N., Boiten, J-W., Silva, Santos L.B. da, Bourne, P.E., Bouwman, J., Brookes, A.J., Clark, T., Crosas, M., Dillo, I., Dumon, O., Edmunds, S., Evelo, C.T., Finkers, R., Gonzalez-Beltran, A., Gray, A.J.G., Groth, P., Goble, C., Grethe, J.S., Heringa, J., Hoen, P.A.C. ‘t, Hooft, R., Kuhn, T., Kok, R., Kok, J.N., Lusher, S.J., Martone, M.E., Mons, A., Packer, A.L., Persson, B., Rocca-Serra, P., Roos, M., Schaik, R. van, Sansone, S-A., Schultes, E., Sengstag, T., Slater, T., Strawn, G., Swertz, M.A., Thompson, M., Lei, J. van der, Mulligen., E. van, Velterop, J., Waagmeester, A., Wittenburg, P., Wolstencroft, K.J., Zhao, J., Mons, B. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data 3. https://doi.org/10.1038/sdata.2016.18 Accessed 18 August 2018.
Natasha Simons is a Research Data Management Specialist with the Australian National Data Service. Located at Griffith University in Brisbane, Natasha serves on the Council of Australian University Librarians Research Advisory Committee and is an ORCID Ambassador. She is an author and reviewer of papers related to library and information management and co-authored a 2013 book on digital repositories. Natasha was the Senior Project Manager for the Griffith Research Hub, which won awards from Stanford University and VALA. She is an advocate for open data, open repositories and ORCID.