Enabling Australian Genomics research through enhancements to the Genomics Virtual Lab

Gareth Price1Derek Benson1*,Simon Gladman2*, Igor Makunin1, Anna Syme2, Helen van de Pol2, Christina Hall2, Nuwan Goonasekera2, Andrew Isaac2, Andrew Lonie2, Nigel Ward3, Jeff Christiansen4, Gareth Price5

  1. RCC-University of Queensland, Brisbane, Australia, {benson, i.makunin}@imb.uq.edu.au
  2. Melbourne Bioinformatics, University of Melbourne, Melbourne, Australia, {aisaac, alonie, syme, cr.hall, helen.vanderpol , n.goonasekera, simon.gladman}@unimelb.edu.au
  3. QCIF, Brisbane, Australia, ward@qcif.edu.au
  4. QCIF and RCC-University of Queensland, Brisbane, Australia, christiansen@qcif.edu.au
  5. QFAB@QCIF, Brisbane, Australia, price@qfab.org

* These Authors contributed equally and are listed alphabetically.

BACKGROUND

The rise of “next generation” sequencing has transformed biological research into a data-intensive endeavour. In order to reduce entry-level complexity for bioinformatics analyses, global efforts have focused on the generation of graphical-user interface front-ends to computational back-ends, of which the Galaxy Project is the preeminent example. Within Australia, Galaxy, R and other analysis environments have been made available inside the Genomics Virtual Lab (GVL), as both a self-installable environment and as a managed service [1, 2]. The BioDeVL project aims to provide a “data enhanced” and “user-enhanced” managed GVL service for all researchers in Australia and to up-skill the community in the use of this platform [3]. This will offer all Australian bioscience researchers the opportunity to more easily access and apply bioinformatics approaches to their research, without needing to worry about resourcing, deploying, configuring and performing other system administration tasks that currently preclude use by many researchers.

The project will:

  • Provide a world leading data advantage by:
    • Enabling all Australian bioscience researchers to access a professionally managed and appropriately resourced on-line computational service platform to underpin their biomolecular data analyses;
    • Ensuring all reference datasets are added to the service with appropriate descriptions and provenance information such that they are unambiguously identifiable – therefore affording Australian researchers an ability to better undertake reproducible analyses on the service.
  • Accelerate innovation by:
    • Enabling sophisticated biomolecular data analysis capability on top of cloud compute resources;
    • Providing reliable, quality controlled and trusted analysis tools within the service to encourage research innovation and collaboration;
    • Training and upskilling biology researchers to understand and utilise molecular biology-related reference data and analytical tools.
  • Create collaborative technology and partnerships for borderless research by:
    • Creation of a single national managed computational service platform, allowing collaborative and borderless research for all Australian researchers, and by invitation, their international collaborators;
    • Continue partnerships with existing international development and technology partners to ensure best practice tools and methodologies are applied to the Australian service to underpin borderless research.
  • Enhance the translation of research by:
    • Helping to translate basic bioinformatics research into tools for diagnostic purposes in fields such as public health by acting as a user-friendly vehicle to make tools and workflows accessible to non-bioinformatics experts.

KEY OUTCOMES

The key outcome for the GVL and Galaxy Australia is user numbers and number of tool executions. The latest figures will be shown in this presentation. However since the beginning of the project specific effort has been focused on the Galaxy Australia component of the GVL, to provide an easy to use analysis platform for genomics research [4]. Galaxy Australia has been aligning its “look and feel” to international Galaxy sites, whilst maintain the tools and reference datasets necessary to support Australian research activity. The reconciliation of computational resources to allow for efficient management of the GVL, ensuring reference datasets are up-to-date and cited in alignment with FAIR principles as well as maintaining reliable, quality controlled and trusted analysis tools is all aimed to increase use of the GVL. Further, the deployment of increased resources for the training and upskilling of biology researchers will support the community to maximize their experience in the GVL.

METHODS

Achieving an optimized GVL and Galaxy Australian service first involved reconfiguration of existing resources hosted in Brisbane and Melbourne, historically both operating a Galaxy service, to provide a single controlling (Head) node for Galaxy Australia. Through the use of tool resource allocation and job submission (based on long run time or high memory usage) to dedicated resources, users will experience the fastest run time possible for their analyses. This has been achieved using a new Head Node configuration, new minimally packaged worker nodes, HTC Condor job management and integrated docker spawned environments. Rationalisation, publication and alignment of training material maintained at Australia and global repositories plus EMBL-ABR led Train-the-Trainer activities will drive the “user-enhanced” experience for Galaxy and GLV users. Finally, having a single user facing help desk will allow for reduced query response time [5].

CONCLUSION

The GVL has undergone transformation and Galaxy Australia has been launched; with the latest version of Galaxy, the most up to date reference genomes, datasets and tools, plus tools requested specifically by Australian researchers to enable their studies, Figure 1

Figure 1: Galaxy Australia Landing Page

 

REFERENCES

  1. GVL server image: https://www.gvl.org.au/get/
  2. GVL managed hosted data analysis environments (Galaxy and R-studio): https://www.gvl.org.au/use/
  3. ANDS / RDS / Nectar DeVL projects: https://www.ands-nectar-rds.org.au/researchdomainprogram
  4. GalaxyAustralia website: https://usegalaxy.org.au
  5. GalaxyAustralia User Support: help@genome.edu.au

BIOGRAPHY:

GARETH HEADS UP THE COMPUTATIONAL BIOLOGY TEAM AT QFAB, JOINING IN EARLY 2017 AFTER NEARLY 15 YEARS AS A GENOMICS SCIENTIST. HE HAS BEEN INVOLVED IN EXPERIMENTAL DESIGN, DATA QC, ANALYSIS AND INTERPRETATION FROM EARLY PRINTED MICROARRAYS THROUGH TO MULTIPLE NGS PLATFORMS. THESE WORKS HAVE INVOLVED A VARIETY OF MODEL ORGANISMS FROM MICROORGANISMS, FRUIT FLIES, MICE TO HUMANS. AT QFAB GARETH HELPS TRANSLATE RESEARCHER’S BIOLOGICAL QUERIES INTO THE SYSTEMIC INFORMATICS LANGUAGE REQUIRED FOR ANALYSIS. GARETH’S VIEW IS THAT BIOLOGICAL RESEARCH, CLINICAL RESEARCH, AND HEALTHCARE ARE AT THEIR BEST WHEN COUPLED WITH THE MOST ACCURATE, HIGHEST THROUGHPUT AND INNOVATIVE TECHNOLOGY AND ANALYSIS. HE USES THIS VIEW TO MOTIVATE THE USE OF INNOVATION TO REDUCE THE TIME BETWEEN DATA GENERATION AND DATA SUMMARISATION AND HAS TAKEN ON THE ROLE OF PROGRAM MANAGER FOR THE GENOMICS VIRTUAL LAB PROJECT (GVL.ORG.AU) TO HELP PROMOTE THIS IMPORTANT AUSTRALIAN RESOURCE TO ALL LIFE SCIENCE RESEARCHERS.

About the conference

eResearch Australasia provides opportunities for delegates to engage, connect, and share their ideas and exemplars concerning new information centric research capabilities, and how information and communication technologies help researchers to collaborate, collect, manage, share, process, analyse, store, find, understand and re-use information.

Conference Managers

Please contact the team at Conference Design with any questions regarding the conference.

© 2017 - 2018 Conference Design Pty Ltd