Mr Ryan Bunney1,2, Professor Andreas Wicenec1,2, Mr Rodrigo Tobar1,2, Mr Nicholas Pritchard1,2, Mr James Strauss1,2, Mr Moritz Wicenec1,2
1International Centre for Radio Astronomy Research, Perth, Australia, 2University of Western Australia, Perth, Australia
Biography:
Ryan is a Research Software Engineer at the International Centre for Radio Astronomy Research (ICRAR). He has 10 years experience developing software in both research and industry, and is passionate about improving the quality and access of software to academics. His research interests are in workflow scheduling, observatory operations, and high performance computing. He is a 2018 Westpac Future Leader Scholar.
https://orcid.org/0000-0002-0246-1922
Abstract:
Science workflows are a tool used to coordinate interdependent computing tasks at scale. Workflow management systems that support scientific workflows take control of scheduling and managing large-scale distributed computing, improving the scale of scientific research and facilitating more reproducible experiments. A challenge with these tools is that the time and resources invested in learning new tools limit their adoption.
We present the Data-Activate Graph Flow Engine (DALiuGE), a suite of software tools that aims to reduce this time overhead by enabling scientists and software engineers to reuse existing code in their workflows. In addition to providing standard management support for Bash applications and Docker/Singularity containers, scientists can integrate existing Python code directly into a DALiuGE workflow – often without rewrites.
Additionally, the Editor for the Astronomical Graph Language Environment – EAGLE – provides the ability to focus on the logic of the workflow – looping over data, adding conditional logic, etc – without concern for the runtime execution of the workflow. DALiuGE also has reproducibility tracking built into the system that will 'grade' workflow runs according to various levels of reproducibility.
Initially developed for the Square Kilometre Array Science Data Processor, DALiuGE has demonstrated its scalability using the entire SUMMIT supercomputer and is actively being used to develop workflows for the CHILES, DINGO, and WALLABY surveys. We will present our experience in developing astronomy pipelines using these tools, discuss lessons learned in designing applications for scientists, and demonstrate their efficacy for workflow applications outside of astronomy.