Ms Keeva Connolly1,2, Matt Andrews3, Peter Brenton3, Jack Brinkman3, Christopher Mangion3, Emily Marshall1, Winnie Mok1, Lisa Phippard1, Sarah Richmond4, Goran Sterjov3, Nigel Ward1, Tom Harrop1, Kathryn Hall3
1Australian BioCommons, Australia, 2QCIF, Australia, 3Atlas of Living Australia, Australia, 4Bioplatforms Australia, Australia
Biography:
Keeva is a scientific business analyst for the Australian BioCommons, where she works on two projects related to biodiversity genomics, the Australian Reference Genome Atlas (ARGA) and the Australian Tree of Life (AToL). She works within QCIF and is based in Brisbane.
Abstract:
The volume of genetic data being generated and deposited online is increasing exponentially, giving way to growing opportunities for data application and reuse. This is particularly the case for high-throughput analyses, which commonly use standardised pipelines to process or transform data in line with research goals. These analyses typically rely on the primary data being both discoverable and interoperable. For genetic sequence data, this usually requires a minimum set of metadata being described according to a common standard – something which is rarely the case for data deposited in different repositories.
Here, I will talk about how we used metadata harmonisation to address this challenge in the context of two projects: the Australian Reference Genome Atlas, a genomic data indexing platform for data discoverability, and; the Australian Tree of Life, an infrastructure project for semi-automated genome assembly, annotation, and publication. We identified equivalent fields between metadata schemata to generate metadata mappings between data repositories, including the Bioplatforms Australia data portal and International Nucleotide Sequence Database Collaboration (INSDC) databases, including the Sequence Read Archive (SRA), BioSample, and GenBank. For the Australian Reference Genome Atlas, mapping multiple data sources to a common schema facilitates data searching, filtering, aggregation, and enabling access to metadata in a common format for bulk analysis. In the Australian Tree of Life project, metadata mapping enables data exchange between repositories and the reuse of pipelines previously developed by the Darwin Tree of Life initiative to process data which meet their metadata standards.