Galaxy architecture and deployment experiences: a case study in how to build complex analysis systems for data-focussed science

Mr Simon Gladman1, Mr Derek Benson4, Dr Jeff Christiansen2, Dr Gareth Price3, A/Prof. Andrew Lonie1

1Melbourne Bioinformatics, University of Melbourne, Melbourne, Australia, simon.gladman@unimelb.edu.au
2Queensland Cyber Infrastructure Foundation, Brisbane, Australia, j.christiansen@uq.edu.au
3Queensland Facility for Advanced Bioinformatics, Brisbane, Australia, g.price@qfab.org
4Research Computing Centre, University of Queensland, Brisbane, Australia, d.benson.imb.uq.edu.au
5Melbourne Bioinformatics, University of Melbourne, Melbourne, Australia, alonie@unimelb.edu.au

GENERAL INFORMATION

  • Half day Workshop
  • includes a hands-on component
  • Up to 20 attendees

DESCRIPTION

Galaxy (https://galaxyproject.org) is a widely used, highly capable bioinformatics analysis platform. It provides users with a large library of analysis and visualization tools, reference datasets, interfaces to global databases, and evolving workflow capabilities that provide provenance and reproducibility. Users build complex analysis jobs in a highly accessible interface, which are then deployed via a scheduler to underlying computational resources. Galaxy has a relatively sophisticated approach to managing user jobs to compute resources and can, for instance, be configured to schedule jobs to disparate HPC and/or cloud resources depending on the job characteristics.

In this workshop we will explore the architecture of Galaxy Australia (http://usegalaxy.org.au), understanding how it is architected to deploy jobs from a common front end to compute resources in Queensland and Victoria. Jobs have access to a common multi-hundred-terabyte reference dataset collection that is intelligently mirrored in real time from the US-based  Galaxy Main (http://usegalaxy.org) using the CernVM file system (https://cernvm.cern.ch/portal/filesystem). We will explore the technologies, cover our experiences of how they work in practice, and discuss the ambitions of a global Galaxy infrastructure network that can leverage the efforts of a global community to maintain and support critical data and software resources.

OUTLINE OF WORKSHOP CONTENT:

  1. Overview of Galaxy. Technical overview of the componentry of Galaxy as a software platform and as a workflow generation and deployment system30 minutes
  1. Galaxy Australia architecture. Overview of the Galaxy Australia archtictural and deployment model.30 minutes
  1. Underlying technologies. Detailed exploration of the job distribution and data sharing technologies being used for Galaxy Australia.90 minutes
  1. Galaxy ‘World’ – roadmap discussion. How can multiple instances of Galaxy make use of complex, high maintenance resources including a tool library which is dependency-free and growing global reference datasets, whilst appearing as a seamless experience to non-expert users?30 minutes

WHO SHOULD ATTEND

Research infrastructure staff  interested in complex, distributed software systems and cutting edge technologies for job and data distribution.

WHAT TO BRING

A laptop, no special software required. We hope to demonstrate some of the technologies being used in Galaxy.


BIOGRAPHY

Andrew Lonie is Director of the Melbourne Bioinformatics, Director of the EMBL Australia Bioinformatics Resource (EMBL-ABR: http://embl-abr.org.au), and an associate professor at the Faculty of Medicine, Dentistry and Health Sciences at the University of Melbourne, where he coordinates the MSc (Bioinformatics). Andrew directs a group of bioinformaticians, computational biologists and HPC specialists within the Melbourne Bioinformatics and EMBL-ABR to collaborate with and support life sciences researchers in a variety of research projects across Australia.

 

About the conference

eResearch Australasia provides opportunities for delegates to engage, connect, and share their ideas and exemplars concerning new information centric research capabilities, and how information and communication technologies help researchers to collaborate, collect, manage, share, process, analyse, store, find, understand and re-use information.

Conference Managers

Please contact the team at Conference Design with any questions regarding the conference.

© 2017 - 2018 Conference Design Pty Ltd