Workflow Orchestration on HPC with Airflow, Cookie-cutter Templates and Jupyter Notebooks

Dr Blake Seers1, Ms Claire Trenham2, Dr Ron Hoeke1, Dr Paul Branson3

1Csiro, Aspendale, Australia
2Csiro, Black Mountain, Australia
3UWA, Crawley, Australia

Workflow orchestration is essential for running large numerical models on high performance computers (HPCs). Airflow is a python workflow orchestration tool created by AirBnB that uses directed acyclic graphs (DAGs) to build up a workflow based on the relationships between individual tasks. We use Jupyter notebooks and our own open-source python library to interactively build the DAG. Once we’ve finished developing the DAG we can then trigger it inside the Airflow web server’s user interface. Cookie-cutter templating is used to populate the model directory with the necessary input files and parameters to run the model. The directory is then moved onto the HPC where the model is triggered using an SSH command. I will cover all these steps and more in this presentation to demonstrate how we have developed our own DAGs to schedule and run large numerical models on HPCs.


Biography:

Blake joined CSIRO’s Ocean & Atmosphere in 2019, working within the Sea Level, Waves and Coastal Extremes team. Before joining CSIRO, Blake completed a PhD in Marine Science and Statistics, and worked a statistical consultant within the Department of Statistics at the University of Auckland. Blake works across various projects within the team where he contributes to the team’s growing codebase and scientific computing requirements.

Date

Oct 14 2021
Expired!

Time

3:50 pm - 4:10 pm

Local Time

  • Timezone: America/New_York
  • Date: Oct 14 2021
  • Time: 12:50 am - 1:10 am