Siddeswara Guru1,2, Minh Dinh2, David Abramson2, Igor Makunin 2, Hoang Nguyen2, Damien Watkins3, Ben Evans4, Nathan Quadros5
1Terrestrial Ecosystem Research Network, Brisbane, Australia, firstname.lastname@example.org
3 Data61 CSIRO, Melbourne, Australia, email@example.com
4NCI, Canberra, Australia, firstname.lastname@example.org
5CRC SI, Melbourne, Australia, email@example.com
A scientific workflow is a series of well-defined coordinated, structured activities that define a particular investigation or experiment process in scientific context . Workflow in science is useful because it enables scientists to:
- describe, manage, share and execute scientific analyses;
- provide a high-level abstract view of scientific computation, while hiding underlying details;
- interface with distributed computing environment;
- capture complete workflow as an artefact and make it a reusable entity ;
- capture provenance information for further analysis and knowledge re-use.
In an eResearch 2017 BoF session, we provided an overview presentation of some of the scientific workflow management systems (SWMS) (e.g., Kepler, Galaxy, Workspace) used in different science disciplines. Notably, an interactive Q&A panel discussed the motivations and the use cases of scientific workflows, how to choose the right tool for particular application, and developed a community around workflow management system.
While some SWMSs have proven their success in improving the rate of scientific discovery, overall uptake of scientific workflows for eResearch is still limited. In this year BoF, we address the challenges in the uptake of these SWMSs from the perspectives of domain scientists, eResearch analysts, workflow engine developers and decision makers. Especially, we will engage technical issues in the following areas:
- developing workflows and subsequent tools;
- debugging individual workflow components and the workflow as a whole;
- leveraging cloud resources and capabilities;
- scheduling workflows jobs in cloud;
- provenance tracking and propagation;
- platforms to use and run workflows;
- reproducibility challenges;
- deploying and sharing workflows.
- Short presentations from domain scientists and eResearch analysts on their experience in developing and using workflow management systems including Kepler, Galaxy, KNime, Cylc and Workspace. An open discussion on challenges in operationalising some of the complex processes using workflows and lessons from different tools. The BoF will conclude with a concrete plan to improve the practice in the scientific workflow for knowledge sharing and capacity building.
- The BoF session will run for 80 minutes. The first 30 minutes is allocated for an introduction to the BoF and short presentations, next 40 minutes for panel discussion to discuss the challenges of uptake and 10 minutes to discuss future coordination and planning.
- Talia, D. Workflows Systems for Science: Concepts and Tools. ISRN Software Engineering, 2013.
- Guru, S.M., I.C. Hanigan, H.A. Nguyen, E. Burns, J. Stein, W. Blanchard, D. B. Lindenmayer, and T. Clancy, Development of a cloud-based platform for reproducible science: the case study of IUCN Red List of Ecosystems Assessment. Ecological Informatics, 2016.
Siddeswara Guru is a program lead for the TERN data services capability. He has experience in the development of domain-specific research e-infrastructure capabilities.