2Research Director, The USC Information Sciences Institute (ISI)
Modern science often requires the processing and analysis of vast amounts of data in search of postulated phenomena, and the validation of core principles through the simulation of complex system behaviors and interactions. This is the case in fields such as astronomy, bioinformatics, physics, and climate and ocean modelling, and others. In order to support the computational and data needs of today’s science, new knowledge must be gained on how to deliver the growing high-performance and distributed computing resources to the scientist’s desktop in an accessible, reliable and scalable way.
In over a decade of working with domain scientists, the Pegasus project has developed tools and techniques that automate the computational processes used in data- and compute-intensive research. Among them is the scientific workflow management system, Pegasus, which is being used by researchers to discover gravitational waves, model seismic wave propagation, to discover new celestial objects, to study RNA critical to human brain development, and to investigate other important research questions.
This talk will examine data-intensive workflow-based applications and their characteristics, the execution environments that scientists use for their work, and the challenges that these applications face. The talk will also discuss the Pegasus Workflow Management System and how it approaches the execution of data-intensive workflows in distributed heterogeneous environments. The latter include HPC systems, HTCondor pools, and clouds.