Mr Simon Gladman1, Mr Derek Benson4, Dr Jeff Christiansen2, Dr Gareth Price3, A/Prof. Andrew Lonie1
1Melbourne Bioinformatics, University of Melbourne, Melbourne, Australia, firstname.lastname@example.org
2Queensland Cyber Infrastructure Foundation, Brisbane, Australia, email@example.com
3Queensland Facility for Advanced Bioinformatics, Brisbane, Australia, firstname.lastname@example.org
4Research Computing Centre, University of Queensland, Brisbane, Australia, d.benson.imb.uq.edu.au
5Melbourne Bioinformatics, University of Melbourne, Melbourne, Australia, email@example.com
- Half day Workshop
- includes a hands-on component
- Up to 20 attendees
Galaxy (https://galaxyproject.org) is a widely used, highly capable bioinformatics analysis platform. It provides users with a large library of analysis and visualization tools, reference datasets, interfaces to global databases, and evolving workflow capabilities that provide provenance and reproducibility. Users build complex analysis jobs in a highly accessible interface, which are then deployed via a scheduler to underlying computational resources. Galaxy has a relatively sophisticated approach to managing user jobs to compute resources and can, for instance, be configured to schedule jobs to disparate HPC and/or cloud resources depending on the job characteristics.
In this workshop we will explore the architecture of Galaxy Australia (http://usegalaxy.org.au), understanding how it is architected to deploy jobs from a common front end to compute resources in Queensland and Victoria. Jobs have access to a common multi-hundred-terabyte reference dataset collection that is intelligently mirrored in real time from the US-based Galaxy Main (http://usegalaxy.org) using the CernVM file system (https://cernvm.cern.ch/portal/filesystem). We will explore the technologies, cover our experiences of how they work in practice, and discuss the ambitions of a global Galaxy infrastructure network that can leverage the efforts of a global community to maintain and support critical data and software resources.
OUTLINE OF WORKSHOP CONTENT:
- Overview of Galaxy. Technical overview of the componentry of Galaxy as a software platform and as a workflow generation and deployment system30 minutes
- Galaxy Australia architecture. Overview of the Galaxy Australia archtictural and deployment model.30 minutes
- Underlying technologies. Detailed exploration of the job distribution and data sharing technologies being used for Galaxy Australia.90 minutes
- Galaxy ‘World’ – roadmap discussion. How can multiple instances of Galaxy make use of complex, high maintenance resources including a tool library which is dependency-free and growing global reference datasets, whilst appearing as a seamless experience to non-expert users?30 minutes
WHO SHOULD ATTEND
Research infrastructure staff interested in complex, distributed software systems and cutting edge technologies for job and data distribution.
WHAT TO BRING
A laptop, no special software required. We hope to demonstrate some of the technologies being used in Galaxy.
Andrew Lonie is Director of the Melbourne Bioinformatics, Director of the EMBL Australia Bioinformatics Resource (EMBL-ABR: http://embl-abr.org.au), and an associate professor at the Faculty of Medicine, Dentistry and Health Sciences at the University of Melbourne, where he coordinates the MSc (Bioinformatics). Andrew directs a group of bioinformaticians, computational biologists and HPC specialists within the Melbourne Bioinformatics and EMBL-ABR to collaborate with and support life sciences researchers in a variety of research projects across Australia.