Dr Carsten Friedrich1, Lelsey Wyborn2, David Lescinsky3, Ryan Fraser1, Geoff Squire1, Stuart Woodman1, Peter Sienkowski3
1Csiro, Acton, Australia,
2National Computational Infrastructure, Canberra, Australia,
3Geoscience Australia, Canberra, Australia
Much has happened in the evolution of the Virtual Geophysics Laboratory (VGL) and the Scientific Software Solution Centre (SSSC) in 2016-2017: for VGL we have added support for the Amazon Cloud and the NCI Raijin supercomputer, improved provenance support, simplified and improved the user interface, enhanced the result visualization capabilities, added support for new science codes and data repositories, as well as started applying the VGL platform to other science domains. For the associated SSSC we have added support for HPC environments such as NCI Raijin; user authentication and authorization; support for publication and peer review processes of entries; as well as digital signing of entries and reviews. This presentation will present these improvements and new features.
A virtual laboratory comprises 3 stages: selection of input data, selection of a tool to process that data and selection of the compute infrastructure to run the selected tool on the selected data. VGL currently allows users to browse and visualize large repositories of data sets hosted at NCI and Geoscience Australia. After selecting a target dataset and region, users can select an analytical code/model that they want to run on the dataset from the SSSC (e.g., magnetic or gravity inversion for the purpose of understanding what is under the observable surface in an area). The analytic model is then submitted to an infrastructure provider of choice (e.g. Amazon cloud, NeCTAR, or NCI HPC facility) and results are made available to the user after completion. The SSSC was designed to be an app-store for analytic code and models that can automatically be discovered and executed in the cloud by any Virtual Laboratory. Researchers can submit science code they have developed to the SSSC and once the software has been reviewed and approved for release it will be automatically discoverable and executable the by SSSC clients such as the Virtual Geophysics Laboratory.
VGL Enhancements in 2016-2017
1. Amazon AWS support
Users can now run VGL jobs on the Amazon Cloud using Amazon EC2 instances for execution and AWS S3 for storing results. To avoid VGL operators facing large bills from Amazon for executing user jobs and storing results, VGL can be configured to require users to provide their own Amazon accounts, via AWS cross account authorization, and be billed for job execution and data storage.
1. NCI Raijin support
VGL now also supports running jobs on the NCI Raijin supercomputer. Users can select Raijin as an option provided the necessary dependencies are available on that platform and they can supply their NCI user name, key, as well as a valid project code. VGL can then generate the required PBS scripts and schedule the job for execution. VGL also supports monitoring job progression, as well as preview and retrieval of results. Since the technologies we use for this, namely ssh and PBS, are widely used in the HPC community it would be relatively easy to support other HPC facilities in the future.
2. Improved provenance reporting
VGL now captures complete provenance for every job it executes. If configured, the provenance information will be automatically submitted to a PROV-O compliant provenance server such as PROMS.
3. Improved User Interface and AAF Support
We have made some major and many small improvements in the Web user interface. Major improvements include faceted search in datasets based on keywords, special bounds, service types, publication dates, and others. We have also improved the jobs results page making it much easier to organize jobs in folders as well as monitor job progress, and preview and download job results. Further, users are now also able to log into VGL using the Australian Access Federation (AAF); and VGL can be configured to require AAF login to access NeCTAR resources for job execution.
Figure 1 New Data Discovery Page with faceted search capability
Figure 2 New Job monitoring and result page with enhanced result preview functionality
4. Support for new Science Codes and Data Repositories
We have registered more science codes in the SSSC and thus made them available for job execution in VGL including support for escript on NCI Raijin, pyGplates on NeCTAR and AWS, as well as execution of CSIRO Workspace workflows on NeCTAR and AWS. For data, VGL users now have access to the Australian National Geophysics Data Collection on the NCI National Environmental Research Data Interoperability Platform (NERDIP – http://nci.org.au/data-collections/nerdip/). Application of VGL in other science domains
The VL platform underlying VGL is generic and thus lends itself to application beyond the geo-science domain. For example in the Earth Observation space, it can be repurposed to apply algorithms in areas such as crop monitoring, carbon accounting, and algal bloom monitoring based on satellite images from the CEOS DataCube.
SSSC Developments in 2016-2017
1. Support for HPC environments such as NCI Raijin
We have extended and optimized the SSSC data model to cater for a wider range of execution environments: in particular we now support code dependencies on PBS based HPC facilities, such as NCI Raijin.
2. User authentication and authorization
While read access for published entries remains for anonymous users, creating new entries or modifying entries now requires users to be registered, logged in, and properly authorized. Users can currently register with a valid email address and once the email has been verified can start creating content and participate in SSSC community activities such as applying for publication of entries or reviewing entries by other users.
3. Digital signing of entries
The SSSC now supports digital signing of entries by content creators, which gives users enhanced assurances about content authorship and integrity. By verifying the signature an end-user can be assured that an entry has been signed by the actual author as well as verify that the entry has not been corrupted or otherwise been modified. The SSSC currently allows users to register their own public signature key with the SSSC and sign entries with the corresponding private signature key.
4. Automatic Versioning
Previously, any modification to a SSSC entry would overwrite and replace the previous version, which is not ideal for reproducibility, provenance reporting, or backward compatibility. The SSSC now preserves all previous versions and when an entry is modified its version number is automatically increased and the new version becomes the current version of that entry. Older versions are still accessible by explicitly referring to their version number.
5. Publication and peer review processes
The SSSC now supports a configurable review and publication process for entries and the SSSC can be configured to require an explicit publication step. When a user wants to make one of their entries available to other users, they can now request publication of that entry in the SSSC. Users with the appropriate authorizations can review entries where publication has been requested. If they are satisfied that the entry is compatible with content and quality guidelines for that SSSC instance, they can release the entry as published. Once published, an entry then becomes discoverable and accessible by other users; either directly in the web interface or through 3rd party clients such as VGL.
The SSSC also now supports creating and browsing reviews of entries. This can be used by moderators as part of a peer-review workflow to decide approval for publication requests, or more generally as a community tool to provide feedback and recommendations to other users. Optionally, reviews can be digitally signed by the reviewer which gives end users increased trust in the authorship and validity of the review.