Dr. Lesley Wyborn1, Dr. Marthe Klöcking2, Dr. Kerstin Lehnert3
1Australian National University, Canberra, Australia, 2University of Münster, Münster, Germany, 3Lamont-Doherty Earth Observatory (Columbia University), Palisades, United States
Biography:
Lesley Wyborn is an Honorary Professor at the NCI and at the Research School of Earth Sciences of the Australian National University. She also works part time for ARDC. She had 42 years’ experience in Geoscience Australia in geochemistry and mineral systems research, as well as in data management. Since leaving GA in 2014 has continued her research into many aspects of Data Science as applied to geochemistry, geophysics, samples, data quality, versioning of datasets, as well as the development of transparent high-performance national-scale datasets for use in HPC environments. ORCiD: https://orcid.org/0000-0001-5976-4943
Marthe Klöcking is a research associate in chemical geodynamics at the University of Münster, working across the disciplines of igneous geochemistry, mantle geodynamics and data science. From 2021-2023 she was the coordinator and manager of the GEOROC database of igneous geochemical rock and mineral compositions. During this role she became actively involved in the global FAIR data community, with a focus on interoperability, and co-founded the OneGeochemistry initiative. She held previous postdoctoral research positions at Macquarie University and the Australian National University. ORCiD: https://orcid.org/0000-0002-6592-9270
Kerstin Lehnert is Doherty Senior Research Scientist at the Lamont-Doherty Earth Observatory of Columbia University and directs Lamont’s Geoinformatics Research Group. Her work centers on the development and operation of community-driven data infrastructure and, in particular, on using cyberinfrastructure to improve sharing of material samples and the data derived from them. She developed and oversees operation of EarthChem and SESAR (both managed as part of IEDA2), and the Astromaterials Data System. She initiated and helped establish International community organizations such as COPDESS, IGSN e.V., and OneGeochemistry. ORCID: https://orcid.org/0000-0001-7036-1977
Abstract:
Geochemical data is typical of Long Tail communities and characterised globally by small-sized, highly variable datasets mainly collected by individuals or small research teams. Geochemistry emerged as a discipline in 1838 and has evolved from low throughput, manual analytical techniques to the highly computerised laboratories of today that rapidly produce highly diverse geochemical and isotopic datasets on samples down to the atomic scale. Exponential increases in data volumes are challenging long-established practices and capabilities for organising, analysing, preserving, and sharing data. Increasing applications of machine learning techniques to large geochemical data compilations highlight the enormous value of the curation and harmonisation efforts undertaken by domain-curated data systems, which provide easy access to large volumes of high-quality, well-organised and standardised data.
Unfortunately, geochemistry as a discipline has been slow to change its methods of storing, publishing and sharing geochemical data and only transitioned to electronic publication methods around 2000. Most researchers managed their data locally on C-drives or on departmental servers. Modern data management is now a necessity for the discipline to thrive in the age of digital data and artificial intelligence, particularly as journals and funders now require the formal publication of datasets in repositories.
It has been a long tale to transform the long tail geochemistry community into modern ways of storing, and curating data and making them compliant with the FAIR, CARE and TRUST principles. This paper will describe how this transition is taking place. Although focused on geochemistry, it is relevant to many other long tail communities.