Dr. Kathryn Hall1, Matt Andrews2, Mr. Simon Checksfield2, Ms. Keeva Connolly5, Mr. Peter Brenton2, Mr. Christopher Mangion2, Ms. Winnie Mok6, Ms. Caitlin Ramsay1, Ms. Sarah Richmond4, Mr. Goran Sterjov3, Dr. Nigel Ward5, Mr. Peter Brenton2
1Atlas of Living Australia, Brisbane, Australia, 2Atlas of Living Australia, Canberra, Australia, 3Atlas of Living Australia, Melbourne, Australia, 4Bioplatforms Australia, Macquarie University, Australia, 5Australian BioCommons, Brisbane, Australia, 6Australian BioCommons, Melbourne, Australia
Biography:
ORCID iD: https://orcid.org/0000-0002-8785-4513
Kathryn is the Product Champion and Project Manager for the Australian Reference Genome Atlas with the Atlas of Living Australia. Kathryn holds a PhD in the taxonomy of marine flatworm parasites from fishes, and worked for many years, prior to joining the ALA, in the field of invertebrate taxonomy and systematics at the Queensland Museum, where she grew to know and love the unique qualities of marine sponges.
It was through her work with marine sponges that she first became interested in the eResearch domain. With colleagues at the museum, she amassed specimens, collection data and morphological observations into a single database to help with the MarBOL (Marine Barcode of Life) Project, and along the way learnt many things about relational databasing and the importance of identifiers. This database grew into the SpongeMaps project and from there, she has continued to work in the field of eResearch, keen to make getting access to and understanding data as easy as possible for other taxonomists, as well as other biological researchers.
Abstract:
The Australian Reference Genome Atlas (ARGA) empowers researchers to search for genomic data derived from Australia’s biodiversity through a number of lenses, for example, taxonomy, ecology, and biogeography. Searching for data using taxonomy can be uniquely challenging. Genomic data are stored in repositories under the taxon name given to them when the data were generated, presenting snapshot views of specimens and their identities; but, specimen identifications can be fluid, particularly in light of genomic data, as can the taxonomy used to classify organisms.
The changeability of taxonomy and specimen identifications makes name-driven searching fragile; this can be overcome programmatically using expansive alternative name frameworks, but, to be useful, these classifications must be maintained. Taxonomies are subject to expert opinion and are time-intensive to curate. Moreover, interpolating nomenclatural changes post-hoc onto genomic datasets introduces confusion, which reduces overall data transparency for end-users. If data indexed by ARGA are discovered via an alternative taxon name, is that because the organism identification was revised, or because the nomenclature changed?
ARGA has developed timelined specimen and taxonomic histories, which enable users to see, in one platform, the full history of specimen identifications, and track changes in the nomenclature for those identifications. All identification and taxon metadata are displayed with metadata attributions, enabling ARGA users to visit sources as wanted. The ARGA metadata chains are critical for helping users: understand when, why and how any names have changed; and keep currency, so data may be reused with the identifiers that best describe accepted taxon concepts.