Visible & reusable workflows
Ove Johan Ragnar Gustafsson1, Georgina Samaha2, Ziad Al Bkhetan1, Paula Andrea Martinez3, Finn Bacall4, Carole Goble4, Steven Manos1, Nigel Ward5, Jeff Christiansen5 1Australian BioCommons, University of Melbourne Victoria, Australia2Sydney Informatics Hub, University of Sydney New South Wales, Australia 3Australian Research Data Commons Australia4University of Manchester , Manchester UK5Australian BioCommons, Queensland Cyber Infrastructure Foundation (QCIF), University of Queensland Queensland, Australia
Abstract
Computational workflows are absolutely critical to contemporary science, particularly in bioinformatics, which requires complex multistep analyses to draw meaning from data. Workflows exist on a spectrum ranging from heterogeneous composites of code – in bash scripts and interactive notebooks – that are often modified on-the-fly for bespoke or interactive analyses – through to production level pipelines managed by a workflow management system that enable portability, reproducibility and robustness across multiple computational infrastructures. Regardless of where a workflow sits on this spectrum, its development requires expertise and time to be invested in the build, testing, and documentation, as well as deployment and optimisation for target computational infrastructure. This investment increases as a workflow moves towards a mature production level artefact.
This importance, complexity and the cost of creation underpins the need to make workflows FAIR (findable, accessible, interoperable and reusable), but also to make sure they are visible, first class digital outputs of research. In this presentation, we will describe the ecosystem of platforms, services and practices that can be leveraged by workflow developers to make their work visible, FAIR, and citable. We will also highlight the role that e-infrastructures play in supporting this ecosystem, by integrating best practice community standards for workflow description, annotation and execution with the reality of deploying workflows to a heterogeneous network of institutional and national computational endpoints. Examples of this include adopting Galaxy and Nextflow community best practices, collaborating directly with registries like WorkflowHub, and supporting enabling services for Australia: Galaxy and Nextflow Tower.
Biography
Dr Johan Gustafsson is part of the community engagement team at the Australian BioCommons. His work supports collaborations between researchers, bioinformaticians, facilities and research infrastructures that aim to democratise access to best practice and FAIR bioinformatics for life scientists. https://orcid.org/0000-0002-2977-5032.