Dr Jeff Christiansen1, Shilo Banihit2, Dr Xin-Yi Chua3, Thom Cuddihy3, Dr Dominique Gorse3, Simon Gladman4, Dr Andrew Isaac4, Dr Neil Killeen5, Wei (Wilson) Liu5, Dr Steven Manos5, Sara Ogston5, Nick Rhodes3, A/Prof Torsten Seemann4, Dr Anna Syme4, Dr Mike Thang3, Koula Tsiaplias5, Nigel Ward1, Dr Mabel Lum6, A/Prof Andrew Lonie4
2 QCIF and Queensland University of Technology, Brisbane, Australia, firstname.lastname@example.org
4 Melbourne Bioinformatics, University of Melbourne, Melbourne, Australia, email@example.com firstname.lastname@example.org, email@example.com, firstname.lastname@example.org, email@example.com
5 VicNode and Research Platform Services, University of Melbourne, Melbourne, Australia, firstname.lastname@example.org, email@example.com, firstname.lastname@example.org, email@example.com, firstname.lastname@example.org
6 Bioplatforms Australia, Sydney, Australia, email@example.com
To understand all functions that occur within a biological system (e.g. a cell or organism) under different environmental or experimental conditions, a global profiling and analysis of all the biomolecular players in that system under the different conditions is required.
Over the past few decades, rapid technological advances in molecular profiling techniques of biological systems have made it possible for the biomolecular repertoire in such samples to be comprehensively characterised. This includes the genome (DNA which instructs the cell how to behave and found in all cells); transcriptome (different mRNAs that are copied from the DNA, whose presence and amounts are specific to the system and condition being examined); proteome (proteins that are formed according to the instructions in these mRNAs); and metabolome (small molecules produced by the organism or obtained from external sources and associated with processes such as metabolism).
Despite the technical ability to undertake such global profiling, integrating and making sense of these different ‘-omics’ data types remains very challenging for researchers. This is from both a conceptual information integration perspective as well as a logistical data management and analysis perspective – there is a lack of integrated and accessible storage, compute, software methods, tools and workflows that enable the integrative analysis of such data .
In late 2015, VicNode, Intersect, QCIF/QFAB and Melbourne Bioinformatics embarked on a collaborative project funded through the NCRIS Research Data Services (RDS) Food and Health Flagship program  to bring together a team with a broad skill set (across data management, biological metadata standards, interoperability, bioinformatics tool development, training and research systems hosting) to develop omics.data.edu.au – a cloud-based system to address these challenges.
In the first phase of its funding and development, the system has been built to accommodate data from bacterial pathogens for a specific research consortium: the Bioplatforms Australia (BPA) coordinated Antibiotic Resistant Pathogens Initiative (ABPRI) , whose members range from microbiologists to clinical researchers and are based at many research intensive universities in Australia including the University of Queensland, the University of Sydney, the University of Melbourne, Monash University, UNSW Australia, University of Technology Sydney, and the University of Adelaide.
The project team developed an integrated cloud-based framework for the ABPRI researchers to find data and undertake a wide range of bioinformatics analyses across genomic, transcriptomic, proteomic and metabolomic data. The omics.data.edu.au system includes:
- An underlying data management platform (DMP)
- allowing researchers to find specific data for their own analyses based on many criteria (e.g. raw versus analysed data; experimental condition; bacterial host and associated disease; omics data type; profiling technology used).
- with an underlying data model that is conceptually applicable to any biological system [i.e. Project > Subject (specimen) > (experimental) Method > Study (omics-type specific) > Dataset(s)], whose specific elements adhere to internationally-agreed community standards for bacterial pathogen data required by global data repositories for each omics type (i.e. European Nucleotide Archive , ArrayExpress , ProteomeXchange  and Metabolights ). These information standards have been adopted to facilitate any future exchange of data from the DMP to such repositories, and is an approach aligned to the FAIR Data Principles .
- built on DaRIS  /Mediaflux .
- An associated data analysis platform (DAP)
- that includes hundreds of tools for bacterial (and general) genomic, transcriptomic, proteomic and metabolomic data analysis.
- tools cover two broad types: (a) to take raw instrument-derived data and convert into meaningful analysed data; and (b) exploratory (e.g. visualisation) and analysis tools to understand comparative differences between different sets of analysed data (e.g. condition A versus condition B).
- caters to different bioinformatics skill levels – from novice to expert.
- provides a variety of access methods – from GUI-based to command-line.
- built on the microbial flavour of the Genomics Virtual Lab (GVL)  (which includes Galaxy ); and other key services such as Pathway Tools .
- Tools and methods to move data between the DMP and DAP
- facilitated by a GenomeSpace  connector.
- supported data transfer methods/protocols include drag-and-drop for GUI users or SCP/SFTP/FTP.
- also allows transfer of data to other computational environments (e.g. institutional resources, private GVL instances in the Nectar cloud etc.)
- Training materials for the above
The project has maintained extensive and ongoing engagement with a wide range of stakeholders with varying interests and/or challenges in biological data production, distribution, management and use: the ABPRI consortium coordinators (BPA); data production facilities (Ramaciotti Centre for Genomics, Australian Genome Research Facility (AGRF), Australian Proteomic Analysis Facility (APAF), Monash Biomedical Proteomics Facility (MBPF), Metabolomics Australia (MA)); bioinformaticians across the consortium (at Melbourne Bioinformatics, AGRF, APAF, MBPF and MA); and the end user researchers.
The project has spearheaded for the first time the connection of multiple separate components that have been NCRIS-funded through previous Nectar, ANDS, RDSI and RDS eResearch investments.
The DMP and DAP have been designed to allow for future flexibility in that they: can be utilised independently of each other if required; can be adapted and extended for future research communities (e.g. mammalian, plant, population (meta-omics); and can accommodate a very wide variety of data types arising from multiple data generation techniques and/or facilities.
In building omics.data.edu.au, we have developed and presented to a research community, a national first: a cloud-based system for both integrated biological data management and associated informatics analysis for four broad “-omics” data types (DNA, RNA, proteins and metabolites), which enables the sharing of data and collaborative analysis amongst members of a research consortium. The platform has been designed so that it leverages existing national eResearch infrastructure and can be adapted and extended for future research communities.
- Gomez-Cabrero, D, et al. Data integration in the era of omics: current and future challenges. BMC Systems Biology, 2014. 8(S2): I1
- Research Data Services (RDS) Food and Health Flagship program – https://www.rds.edu.au/omics
- BPA Antibiotic Resistant Pathogens Initiative (ABPRI) – http://www.bioplatforms.com/antibiotic-resistant-pathogens/
- European Nucleotide Archive – http://www.ebi.ac.uk/ena
- ArrayExpress – http://www.ebi.ac.uk/arrayexpress/
- ProteomeXchange – http://www.proteomexchange.org
- Metabolights – http://www.ebi.ac.uk/metabolights/
- FAIR Data Principles – https://www.force11.org/group/fairgroup/fairprinciples
- DaRIS – https://wiki.cloud.unimelb.edu.au/resplat/doku.php?id=data_management:daris
- Mediaflux – http://www.arcitecta.com/Products
- Genomics Virtual Lab (GVL) – https://www.gvl.org.au
- Galaxy – https://usegalaxy.org
- Pathway Tools – http://brg.ai.sri.com/ptools/
- GenomeSpace – http://genomespace.org
Jeff has a PhD in Biochemistry from the University of Queensland, and started his career as a researcher in the fields of cancer, molecular genetics and embryo development in both Australia and the UK, prior to moving into the management of large biological data assets through the establishment of a UK-based international database of mouse gene expression and anatomy.
Prior to joining QCIF/RCC, Jeff was based at Intersect Australia in Sydney where he was the National Manager of the RDS-funded med.data.edu.au project and also responsible for a number of biology-focused data and IT-related projects across NSW.
Prior to this, he was based in Melbourne at the Australian National Data Service (ANDS), where he was involved in commissioning and monitoring a number of biology/medicine-focused national data management projects.