Enabling model and simulation provenance and data transparency with Provena

Dr. Jonathan Yu1, Mr. Peter Baker2, Mr. Ross Petridis1, Ms. Linda Thomas4, Mr. Parth Kulkarni1, Mr. Peter Fitch2, Ms. Sharon Tickell3, Ms. Xinyu Hou2

1CSIRO, Melbourne, Australia, 2CSIRO, Canberra, Australia, 3CSIRO, Brisbane, Australia, 4CSIRO, Hobart, Australia

Biography:

Dr. Jonathan Yu is a data architect with the Environmental Informatics group in CSIRO. He has expertise in data and information systems, data integration (i.e. Linked Data), and data analytics. He and his teams research innovative approaches to enhance decision making and understanding through improved information access across the environmental domain. Dr. Yu and his team has developed Provena (provena.io) which is a Data and Provenance Information System supporting the Reef Restoration and Adaptation Program (RRAP) Modelling and Decision Support subprogram.

Abstract:

Provena is a cloud-based provenance management system that enables users to register, manage, and publish knowledge artefacts (e.g. datasets, software) used and generated within a scientific or modelling workflow (see https://provena.io). It allows the capture of any simulation or model run, their results, and associated data (including preceding model run records and existing datasets) in an integrated, linked and consistent manner which enables traceability of a portfolio of modelling activities. Provenance captured in Provena is structured using an extension to the Prov Ontology (Prov-O), a W3C standard for describing provenance. Provena is being applied in the Modelling and Decision Support (MDS) subprogram of the Reef Restoration and Adaptation Program (RRAP) and in the National Bushfire Intelligence Capability (NBIC). Both MDS and NBIC require provenance capture to enable transparency of the results from the modelling and in support of data reproducibility. In this talk, we will give an overview of how Provena is being applied to manage data and their lineage as part of an ensemble of models and decision support tools in both MDS (for advising Great Barrier Reef intervention deployments), and NBIC (for developing bushfire hazard and risk maps). We show how Provena can be applied in other modelling and simulation activities and projects with similar needs in other domains.

 

Categories