Leveraging Continuous Integration for enhanced eResearch on High Performance Computing Clusters.

Dr. Ignatius Menzies1

1Garvan Institute of Medical Research, Sydney, Australia

Biography:

Since joining the Garvan Institute of Medical Research in October 2023, I've been working on developing, supporting and deploying software to help researchers at Garvan, and contributing to open source software.

I am passionate about reproducible research. Before Garvan I worked as Reproducible Research Lead at Dragonfly Data Science in New Zealand.

I'm also familiar with public sector research and reporting. I worked as Software Developer in the Public Sector Digital at Datacom and was Data Science Lead at the Aotearoa / New Zealand Ministry for the Environment.

My background is in research, and I have a PhD in Ecology.

Abstract:

Continuous Integration (CI) is essential in modern software development, enabling frequent code integration, automated testing, and rapid feedback. In eResearch, CI enhances reproducibility and efficiency, but High Performance Computing (HPC) environments pose unique challenges, such as hardware diversity, job scheduling, resource management requirements, and strict administrative policies.

We developed a system that integrates CI with HPC to validate changes to bioinformatic workflows. Our system uses GitHub Actions, runners hosted on Google Cloud Platform, and the National Computational Infrastructure’s Gadi supercomputer. Any committed changes to workflows trigger a job on Gadi, running the amended workflow with a sample dataset and a suite of tests.

This approach regularly integrates code changes, reducing the likelihood of errors in production workflow runs. The tests run on the same hardware as the production workflows, ensuring consistency. While frequently running small validation jobs consumes resources, it significantly reduces errors during much larger production jobs. This system leverages existing HPC access mechanisms, requiring less internal integration with HPC administration compared to other frameworks and tools like Jacamar CI and Jenkins plugins.

By leveraging CI, we can significantly enhance the quality, productivity, and resource efficiency of eResearch on HPC infrastructures.

 

 

Categories