Curating species lists: Aggregating data to enhance context
Keeva Connolly1, Kathryn Hall2 1Australian Biocommons, University of Queensland, St Lucia, QLD, Australia2Atlas of Living Australia, Dutton Park, QLD, Australia
Abstract
The Australian Reference Genome Atlas (ARGA) is an indexing platform for genomic data from Australian and Australian-relevant species. It enables users to search for data across multiple repositories from a single, centralised portal.
One of ARGA’s aims is to enrich genomic data by exposing metadata and providing data from additional sources. This includes species-level data, which can provide users with more information about an organism’s ecological, commercial, and legal context. One of the ways this is implemented in ARGA is through curated species lists for a selection of ecological and thematic groupings, such as marine biodiversity and notifiable pest species.
Species lists were compiled by sourcing data from a range of resources, including legislative instruments, international conventions, national taxonomic databases, such as the Australian Faunal Directory and Australian Plant Census, and industry reports. We developed ontologies to define and pull down common species’ attributes, so that multiple listings and data fields could be applied to one species. For example, threatened species are described according to the national, state and/or territory lists they are included on, as well as the conservation status designated on each of those lists.
By building lists around species’ attributes, ARGA provides users with defined contextual categorisations for species, enables users to make custom data searches by filtering on specific attributes, and to browse data according to curated groupings. Exploring data availability across species lists can reveal which taxa are data-rich and which are data-poor, to help users plan future research or policy.
Biography
Keeva is a scientific business analyst at the Australian BioCommons, a digital infrastructure platform providing tools, capabilities, and training to extend national research capacity in the life sciences. She works on the Australian Reference Genome Atlas (ARGA) project, building a new indexing platform to improve the discoverability of genomic data for Australian and Australian-relevant species.