Goran Sterjov1, Matt Andrews1, Peter Brenton1, Simon Checksfield1, Keeva Connolly2, Christopher Mangion2, Winnie Mok2, Caitlin Ramsay1, Sarah Richmond3, Nigel Ward2, Kathryn Hall1
1Atlas Of Living Australia, Australia, 2Australian BioCommons, Australia, 3Bioplatforms Australia, Australia
Biography:
Goran spends his days working on the Australian Reference Genome Atlas project and fighting for ownership of the keyboard with his cat. He has strong opinions about Linux and the Rust programming language. https://orcid.org/0009-0001-3625-4315
Abstract:
The Australian Reference Genome Atlas (ARGA) took its first steps as a publicly accessible service that allows researchers to easily find genomic data for all Australian relevant species. We now face new challenges in maintaining the ARGA index and keeping it as up to date as possible.
One such challenge is in greatly increasing the transparency of how the data was generated, collected, and folded into the index. Another related challenge is in adding transparency and provenance to existing datasets as a function of our data processing pipeline.
ARGA has implemented a novel approach to tracking and storing changes to publicly available datasets like the National Species List (NSL), OZCAM, and Plazi treatment bank. By combining the latest works on conflict-free replicated data types (CRDTs), operation logs stored in a PostgreSQL database, and an entity system, ARGA has the ability to show granular changes to every record column along with detailed attribution for each change. Furthermore, it has the potential to add high availability and eventual consistency to any subset of ARGA's index allowing for a richer collaborative experience amongst aggregators and data authors.
In this presentation, we demonstrate how leveraging this Highly Available Logically Deterministic Entity System enables us to increase the transparency with our data aggregation pipelines and extend that ability to external datasets that do not provide that level of detail. As a result of this gestalt switch ARGA makes provenance substantially more searchable and actionable, a keystone of all sciences.