Dr Jens Klump1, Dr Mingfang Wu3, Dr Lesley Wyborn2, Farnoosh Sadeghian4
1CSIRO, Perth, Australia
2ANU/NCI, Canberra, Australia
3ARDC, Melbourne, Australia
4Monash University, Australia
Research data in digital form, once published and made accessible online, can be easily copied, stored in multiple places, and re-published through more than one repository or service. Mirroring resources is a common practice, but offering exactly the same version of the data in multiple places raises questions as to why? Further, the owner/custodian of the original dataset often does not know it has happened and there is no recognition of their role in making the dataset available in the first place.
So what are the pros and cons of data re-publication? Reproducibility can become a major issue. How can humans and machines know whether they are accessing an authoritative copy of the data? How can the authoritative source (data centre) be attributed or acknowledged in mirror sites? Which site gets credit for the used and cited data? What needs to be replicated in order to preserve the quality standards of the original data? In this session, we will explore these and other questions and discuss possible next steps to develop best-practice guidelines on data re-publication.
The BoF will be in panel format and will start with 5 minute lightning talks from panel members on four key issues of republication (20 minutes):
1) pro republication (why is it a good idea);
2) counter republication (why it is a bad idea);
2) authority and identity; and
3) credit and ethics.
This will be followed with Panel and plenary discussion (30 minutes) and then closing remarks and next steps (10 minutes).
Jens Klump is a geochemist by training and leads the Geoscience Analytics Team in CSIRO Mineral Resources based in Perth, Western Australia. In his work on data infrastructures, Jens covers the entire chain of digital value creation from data acquisition to data analysis with a focus on data in minerals exploration. This includes automated data and metadata capture, sensor data integration, both in the field and in the laboratory, data processing workflows, and data provenance, but also data analysis by statistical methods, machine learning and numerical modelling.