Dynamic allocation of computational resources using Airflow and Terraform

Ms Aniek Roelofs1,2, Dr James Diprose2, Dr Richard Hosking1,2, Dr Rebecca Handcock1,2, Professor Cameron Neylon2, Associate Professor Lucy Montgomery2, Dr Alkim Ozaygen2, Dr Katie Wilson2, Dr Chun-Kai (Karl) Huang2

1Curtin Institute for Computation (CIC), Perth, Australia
2Curtin Open Knowledge Initiative (COKI), Perth, Australia

Situation

The Observatory Platform, developed by the Curtin Open Knowledge Initiative (COKI), is a FAIR principles environment for fetching, processing and analysing data to understand how well universities operate as Open Knowledge Institutions. Around 15 different data sources are currently used and this number is growing. To manage these data sources, many different computational resources are required that need to be maintained in a clear and repeatable way. This is especially important as external researchers should be able to easily deploy their own Observatory Platform and collect data as desired.

Task

Both workflow- and infrastructure-management systems are desired to address this. The different datasets vary immensely in size and complexity, therefore the required storage space and computational power also varies. To prevent wastage of any resources these two systems should interact with each other and adjust accordingly.

Action

Terraform is used to deploy our system on the Google Cloud Platform. The Virtual Machine (VM) created by Terraform contains Docker containers that host Airflow, the workflow management system. When a more complex workflow is scheduled, Airflow will run an additional workflow that calls the Terraform API to create another, larger, VM that runs the complex workflows and shuts down automatically when they are finished.

Result

We created an environment that can be easily be set-up remotely while being managed within a team. Both Terraform and Airflow are open-source tools, ensuring that the Observatory Platform stays FAIR. Cost savings are achieved by making use of computational resources as efficiently as possible.


Biography:

Aniek Roelofs is a developer at Curtin University where she is a part of the COKI team. She obtained a Master of Science in Bioinformatics from the University of Amsterdam and has experience setting up workflows and processing big data.

Previously working with sequencing data, she’s now focused on bibliometric data to assist the Curtin Open Knowledge Initiative with their research on how well universities operate as Open Knowledge Institutions.

Research software development workflows in Julia, applied to JuMP.

Dr Frederik Geth1, Dr Rahmat  Heidarihaei1, Dr James Foster1

1CSIRO, Newcastle, Australia

In this talk we dive into Julia package development workflows, using Visual Studio Code with the Julia plugin as a development environment.

We focus on workflows for small packages in the context of scientific research.

To illustrate this, we develop a new module with JuMP, a mathematical optimization toolbox in Julia, as a dependency.

We showcase initialization of a new module and how to use version control, unit testing, documentation generation and continuous integration through Github Actions.

Finally, we show how to use the package manager, and set up environments, to streamline the development process.


Biography:

Frederik Geth is a research scientist working with the CSIRO in Newcastle in the energy systems program. He is a power system engineer and obtained a PhD from the university of Leuven in Belgium in 2014. His research focus is applications of optimization models in distribution network operations, including unbalanced state estimation and optimal control of battery storage systems.

Enabling Genomic Analysis to Improve Risk Characterisation in Australia’s Red Meat Industry

Mr Derek Benson1, Dr Tim Ho2, Dr P. Scott Chandry3, Dr Glenn Mellor4

1CSIRO, Pullenvale, Australia
2CSIRO, Clayton, Australia
3CSIRO, Werribee, Australia
4CSIRO, Coopers Plains, Australia

Galaxy is a workflow platform that enables scientists to connect powerful computational analysis tools into pipelines which can be offloaded to high performance computing (HPC) systems. This work demonstrated how we applied Galaxy to a scientific problem important to Australia’s red meat industry through a genomic analysis pipeline.

As part of an eResearch Collaboration project, we allocated 20% of an FTE over a 6-month period to work with a research project team to perform genomics analyses for a partner in Australia’s red meat industry. During the project, we integrated multiple tools required by the pipeline into the Galaxy service to create a reproducible genomic analysis workflow.

The workflow was created to deploy a bacterial characterisation pipeline for CSIRO’s support of the meat industry. Isolated sequence data was processed on CSIRO’s Galaxy platform using genomic analysis tools in a process different from a traditional genus / species / serotype approach that facilitates improved bacterial hazard characterisation. The pipeline includes quality control, assembly of bacterial genomes, and searching and reporting on genes and virulence factors to build a risk profile for predicting foodborne disease potential. It makes extensive use of the HPC facility at CSIRO to improve the speed of processing with HPC resources dynamically matched to the size of input data and the tools being used.

This work used a mixture of high performance computing and storage resources to support a genomic analysis pipeline. The CSIRO Galaxy platform hides the underlying infrastructure complexities, allowing researchers to focus on creating reproducible science.


Biography:

Bio to come

Recent Comments

    Categories

    About the conference

    eResearch Australasia provides opportunities for delegates to engage, connect, and share their ideas and exemplars concerning new information centric research capabilities, and how information and communication technologies help researchers to collaborate, collect, manage, share, process, analyse, store, find, understand and re-use information.

    Conference Managers

    Please contact the team at Conference Design with any questions regarding the conference.

    © 2018 - 2020 Conference Design Pty Ltd