Discussing the implementation of SciDir – A scientific software distribution repository for bringing reproducible software containers securely to HPCs in Australia.

Discussing the implementation of SciDir – A scientific software distribution repository for bringing reproducible software containers securely to HPCs in Australia.

Greg D’arcy1, Steffen Bollmann 2, Aswin Narayanan 6, Peter Marendy 3, Nigel Ward5, Sarah Beecroft 4

1Aarnet, Tower A, 799 Pacific Hwy, Chatswood, NSW, Australia
2University of Queensland, Brisbane, Queensland, Australia
3Queensland Cyber Infrastructure Foundation Ltd (QCIF) , Brisbane, Queensland, Australia
4Pawsey Supercomputing Research Centre , Perth, Western Australia, Australia
5The Australian BioCommons, Melbourne, Victoria, Australia
6National Imaging Facility, Brisbane, Queensland, Australia

Abstract

The analysis of scientific data requires specialised scientific software and processing pipelines. However, accessing these software packages is laborious on high-performance computing systems. Researchers often spend inordinate amounts of time compiling the requisite software and research results are often difficult to reproduce due to system dependency differences, even when given the original data and analysis code.

In this BoF session, we would like to discuss the development of an open-source, community-oriented project that addresses the issues of accessibility and reproducibility of scientific software. We would like to receive feedback from the community and plan how we can work together toward implementing a CVMFS service in Australia for distributing scientific software.

We are proposing to build on previous work around containers on CVMFS in the Neurodesk project and BioCommons to develop a secure scientific software distribution system. The proposed platform consists of a software container build system, where the scientific community proposes software applications and reference datasets. These artefacts are built, packaged in software containers, and scanned for vulnerabilities before being uploaded to a container registry. The software container metadata is stored in a database for fast and transparent tool discovery. A stratum0 server and a network of stratum1 servers will enable this software to be used on various compute endpoints.

Our approach would accelerate progress in all scientific disciplines dealing with the processing of data. It would enable the flexible processing of scientific data across different computing platforms and the portability of analyses between them.

Biography

Greg D’Arcy works as a Research Engagement Strategist, AARNet. Greg is an experienced Research Analyst and Program Manager with twenty years’ experience in large-scale digital transformation and infrastructure initiatives within the research and education sectors.

Categories