Performance Improvements with GPUs for Marine Biodiversity: A Cross-Tasman Collaboration

Lev Lafayette1, Mitch Turnbull2, Mark Wilcox3, Eric A. Treml4

1University of Melbourne, Parkville, Australia, lev.lafayette@unimelb.edu.au

2Nyriad, Cambridge, New Zealand, mark.wilcox@nyriad.com

3Nyriad, Cambridge, New Zealand, mitch.turnbull@nyriad.com

4Deakin University, Geelong, Australia, e.treml@deakin.edu.au

 

Identifying probable dispersal routes and for marine populations is a data and processing intensive task of which traditional high performance computing systems are suitable, even for single-threaded applications. Whilst processing dependencies between the datasets exist, a large level of independence between sets allows for use of job arrays to significantly improve processing  time. Identification of bottle-necks within the code base suitable for GPU optimisation however had led to additional performance improvements which can be coupled with the existing benefits from job arrays. This small example offers an example of how to optimise single-threaded applications suitable for GPU architectures for significant performance improvements. Further development is suggested with the expansion of the GPU capability of the University of Melbourne’s “Spartan” HPC system.

University of Melbourne HPC and Marine Spatial Ecology With Job Arrays

From 2011-2016, the University of Melbourne provided general researcher access to a medium-sized HPC cluster system called “Edward”, designed in a traditional fashion. As “Edward” was being retired an analysis of actual job metrics indicated that the overwhelming majority of jobs were single node or even single core, especially as job arrays.  The successor system, “Spartan”, was therefore designed more with a view of high throughput rather than high performance. A small traditional HPC system with a high-speed interconnect was partitioned from a much larger partition built on OpenStack virtual machines from the NeCTAR research cloud. This proved to a highly efficient and optimised method both in terms of finances and throughput [1].

A specific example of large number of computational tasks that are designed for single-threaded applications with modest memory requirements is that for research in the marine biodiversity and population connectivity, which has significant implications for the design of marine protected areas. In particular there is a lack of quantitative methods to incorporate, for example, larval dispersal via ocean currents, population persistence, impact on fisheries etc. The Marine Spatial Ecology and Conservation (MSEC) laboratory at the University of Melbourne has been engaging in several research projects to identify the probable dispersal routes and spatial population structure for marine species, and integrate these connectivity estimates into marine conservation planning [2].

Code Review for GPGPU Optimisation

There are a number of architectural constraints on GPUs. They are, to a very large extent, independent of their host system. Object code needs to be compiled for the GPU (e.g., using OpenCL or nvcc). There is no shared memory between the GPU and CPU and any unprocessed data must be transferred to the GPGPU environment and then back to the CPU environment when completed. This said, GPUs typically only have small amounts of cached memory, if at all, replacing the need with GPU pipelining and ensuring very high memory transfer between the GPU and the host [2].

During the first half of 2017 Nyriad reviewed the HPC infrastructure, existing MATLAB(R) source code and sample data, and wrote a test suite designed to run the CPU and GPU versions at the same time. There were two review stages; the first for optimisation of the existing MATLAB (R) code base, followed by identification of functions that could be distribution and rewritten for GPUs.

Performance Improvements

Nyriad code review identified bottlenecks that were available for GPGPU workloads. On the University of Melbourne HPC system, “Spartan”, using a single GPU, a 90x performance improvement was achieved over the original code and a 3.75x improvement over the CPU version with 12 threads available for the 4.6 GB Atlantic Model simulating 442 reefs. The simulation, previously taking 8 days to complete on one of the most powerful nodes (i.e. GPU or physical), could be completed in 2 hours. On the other hand, for the 4 MB South Africa Benguela Region dataset the GPU version is faster than the original code, but slower than the improved CPU implementation.

If the code is refactored to process reefs in parallel we anticipate that utilisation of the node would improve on a per-GPU and multi-GPU level, significantly reducing the single simulation time by fully utilising the Spartan GPU node on which it is run. With this change we predict a performance improvement of over 5x compared to the existing GPU code on meaning while using more resources on a node the execution time of a single simulation would greatly reduce. Smaller datasets would also likely achieve some improvement as per-GPU utilisation would increase. Demonstrated in Figure 2. is the performance increase of the current two versions, and the predicted performance of the multithreaded GPU version, when running a single simulation on the Atlantic data set of 442 reefs over 100 days.

Further Developments

Nyriad’s review found that there is significant opportunity in the use of data integrity and mathematical equivalence algorithmic techniques for enabling porting of code to GPUs with minimal impact to the research workflow. With notable performance improvements to a range of job profiles, a significant expansion of Spartan’s GPGPU capacity has just been implemented. The partition, funded by Linkage Infrastructure, Equipment and Facilities (LIEF) grants from the Australian Research Council is composed of 68 nodes and 272 nVidia P100 GPGPU cards The major usage of the new system will be for turbulent flows, theoretical and computational chemistry, and genomics, representative of the needs of major participants.

The University of Melbourne and Nyriad will continue their research collaborations, especially in the GPGPU environment for data integrity and mathematical equivalence, scalability testing and hybrid clusters to enable more scientific programming users to progressively scale their work up to larger systems.

REFERENCES

  1. Lev Lafayette, Greg Sauter, Linh Vu, Bernard Meade, “Spartan : Performance and Flexibility: An HPC-Cloud Chimera”, OpenStack Summit, Barcelona, October 27, 2016
  2. For example, Keyse, J., Treml, EA., Huelsken, T., Barber, P., DeBoer, T., Kochzuis, M., Muryanto, A., Gardner, J., Liu, L., Penny, S.,  Riginos, C.  (2018),  Journal of Biogeography, February 2018
  3. Shigeyoshi Tsutsui, Pierre Collet (eds), (2013), Massively Parallel Evolutionary Computation on GPGPUs, Springer-Verlag

Biography:

Lev Lafayette is the Senior HPC Support and Training Officer at the University of Melbourne, where he has been since 2015. Prior to that he worked at the Victorian Partnership for Advanced Computing in a similar role for eight years.

Fostering an organisation-wide accelerated computing strategy

Jake Carroll1

1The University of Queensland, Brisbane, Australia, jake.carroll@uq.edu.au

 

Background

The use of accelerators (GPU, ASIC, FPGA) in research computing has become more prevalent as hardware/software ecosystems have matured. To complement this, frameworks from vendors such as nVidia and AMD have become fully-featured. As a result of a series of significant ARC/NHMRC grants – an unprecedented amount of scientific imaging infrastructure is being commissioned on the University of Queensland St Lucia campus. To leverage scientific outputs and process the data that this new infrastructure would generate, UQ procured its first tranche of accelerated computing capability in late 2017. This presentation discusses the user-engagement strategy behind UQ’s accelerated computing deployment, how it worked, why it worked and why it was a novel approach in the sector.

WIENER

In late 2017, after an extensive benchmarking, analysis and design process, the Wiener supercomputer was procured to enable near real time deconvolution and deskew from imaging infrastructure, such as UQ’s new Latice Light Sheet Microscope (LLSM)[1]. This platform was the first in the Asia Pacific to feature the nVidia Volta V100 GPU and only the fourth production deployment in the world. The Wiener supercomputer was the largest investment in GPU/accelerated supercomputing that the state had ever made. The initial intention of Wiener was to provide a powerful means of deconvolution [2] to the LLSM, but it was quickly realised that with this many GPU’s connected tightly in a dedicated supercomputing deployment, the platform would serve as UQ’s launchpad for a general accelerated computing strategy.

basis of advanced computing strategy

UQ, as with several of its contemporaries has a significant investment in supercomputing. UQ’s strategy differs somewhat from its equivalent national and sister-state facilities in that it provides different pillars of supercomputing for different workloads in dedicated infrastructure.

 

Table 1: UQ’s Supercomputing Infrastructure load-out

Platform name Machine domain focus Workload characterisation Expected user demand Actual user demand
Tinaroo Multi-discipline MPI, tightly coupled shared memory, massively parallel High High
Awoonga Multi-discipline Loosely coupled, MPI-slack, high latency, cloud-like. Medium Medium
FlashLite Multi-discipline High throughput, high memory High Low
Wiener Multi-discipline GPU, ML, DL, CNN and imaging specific. Low High

 

UQ misjudged the user demand for both FlashLite and Wiener, but for different reasons, which strategic discussion in this presentation will explain and articulate.

Fostering an accelerated computing community

In the initial, as can be seen in Table 1, UQ made some assumptions about where it thought the most user demand would be, which proved incorrect. This lead to initial interest in Wiener being far more profound than first anticipated. UQ expected that Wiener would cater to a niche subset of imaging workloads, but what was unanticipated was the level of sophistication and understanding of application of convolutional neural networks, deep learning and machine learning techniques in the domain of imaging itself. An example was our overt expectation that deconvolution algorithms would run against the GPU infrastructure using codes such as Microvolution and SVI’s Huygens. The truth was, researchers were already considering using machine vision techniques and TensorFlow at scale to characterise and train image sets for more accurate detection of cells, cancers and viruses. [3]

At this point, UQ rationalised that it needed to take a more direct approach in engagement and collaboration with end users to effectively liberate the capability of this new platform. A core tenant of this was a personal and one on one approach to each workload. Whilst this is an  administrative burden, it has been demonstrated that it delivers significantly better outcomes. Thus, the general ‘onboarding’ process to Wiener, from an early point of production state became the following process:

  1. User approaches RCC with a request for compute time on accelerator based HPC.
  2. A subject matter (computer science, HPC) expert will then make an appointment to meet with the researcher or research group in order to better understand the science.
  3. A longer discussion takes place, to learn about the workload type, the potential hardware/software and computing environment impact. At this point the researcher and subject matter expert work towards a defined job-layout which is both optimal for the workload and best fit for infrastructure.

The initial consultation process generally takes between two to three hours.

UQ has empirical and measured evidence to suggest this method of personal interaction to breed a stronger capability in accelerated computing creates a far more efficient use of infrastructure, than the generally accepted process of providing a user a set of documents, readme’s and how-to instructions at a distance.

conclusion

Early analysis suggests that there is a correlation between the employment of direct consultation and scientific discussion between a domain expert (in the scientific research domain) and a research computing specialist and the quality of the computational run or input in these accelerated computing platforms. This now forms the basis of the operating procedures of the Wiener supercomputing facility.

REFERENCES

  1. UQ IMB ARC/NHMR Lattice Light Sheet Microscopy installation. Retrieved from https://imb.uq.edu.au/article/2016/11/45-million-imb-led-discovery-research, accessed June 8th, 2018
  2. Deconvolution Definition, Retrieved from https://en.wikipedia.org/wiki/Deconvolution, accessed June 8th, 2018.
  3. HPC Wiener harnessed for automating skin cancer diagnosis, Retrieved from https://rcc.uq.edu.au/article/2018/05/hpc-wiener-harnessed-automating-skin-cancer-diagnosis, accessed June 8th, 2018.

Biography:

Jake is currently the Associate Director of Research Computing for UQ’s three large scientifically intensive research institutes – the Australian Institute for Bioengineering and Nanotechnology, the Institute for Molecular Bioscience and the Queensland Brain Institute.

Jake has spent the last 12 years in scientific computing, working on everything from building supercomputers to managing the strategy and complexity that comes with scientific endeavour.

Jake spends his time working to make scientific computing platforms, technology and infrastructure as good as it can be, such that world class research can be conducted, unencumbered.

Jake’s background is in both computer science and business leadership – constantly fighting with himself, trying to accommodate both (very different concepts) in his working life – ultimately to try and make them work together.

High-level Cloud Application Description and Management

Gabor Terstyanszky1Gab Pierantoni2, Tamas Kiss3

1University of Westminster, London, United Kingdom, terstyg@gmail.com

2University of Westminster, London, United Kingdom, G.Pierantoni@westminster.ac.uk

3University of Westminster, London, United Kingdom, T.Kiss@westminster.ac.uk

 

Introduction

Cloud computing has successfully and steadily addressed issues of how to run applications on complex distributed computing infrastructures. However, it must address specific deployment, scalability and security requirements. Nowadays, Infrastructure as a Service (IaaS), Platform as a Service (PaaS) and Software as a Service (SaaS) solutions are widely used in academia, business and public sector to manage applications in the Cloud. At one hand, on-demand access to the Cloud in a flexible and elastic way could result in significant cost savings due to more efficient and convenient utilization. It can also replace large investment costs with long-term operational costs. On the other hand, however, the efficient and dynamic utilization of the Cloud to run applications is not trivial. The take up of cloud computing in some application areas is still relatively low due to limited application-level flexibility and shortages in cloud specific skills. As a result, the move to the Cloud has been somehow slower and more cautious in these areas due to both application- and infrastructure-level complexity.

To enable the execution of a large variety of applications in the Cloud in a cost effective, flexible, seamless and secure way, applications must be deployed, launched, executed and removed through a framework that hides cloud specific details. To manage applications in the Cloud it needs information, such as their architecture, resources and services they need, and QoS parameters they have to meet. Application description languages can define the application architecture, specify where to deploy and run applications, how to achieve their cost effective execution, and how to provide the required security to protect data.

TOSCA-BASEd HIGH-LEVEL Application description and execution

The Cloud Orchestration at the Level of Application (COLA) project [1], funded by H2020, aims at fostering the adoption of applications to the Cloud for public sector organisations and SMEs. COLA is elaborating a generic and pluggable framework, called Microservices-based Cloud Application-level Dynamic Orchestrator (MiCADO) [2], to support the optimal and secure deployment and run-time orchestration of cloud applications. Application Developers can describe applications including their Quality of Service (QoS) parameters related to deployment (flexibility), economic viability (costs), performance (scalability) and security (data protection and privacy) and submit this description to the MiCADO framework. This framework is based on existing low-level cloud container technologies (e.g. Docker Swarm [3], management and orchestration solutions (e.g. Occopus [4]), MiCADO is generic in the sense that its services are not restricted to particular technologies and can be implemented using different existing technologies and services.

We are focusing on application description and management in the Cloud. There are three major applications description approaches: cloud platform (Amazon, Microsoft Azure, etc.) and cloud orchestration tool dependent approaches (Chef, Heat etc.); and platform independent applications description languages (Camp and TOSCA). All these approaches properly describe the applications’ architecture specifying services they are composed of and how they are connected and artefacts and resources needed to run applications. Approaches used by cloud platforms and cloud orchestration tools are not based on standards and tied to specific implementations or platforms. As a result, it’s not easy to reuse their application descriptions in heterogeneous cloud environments. There are major differences in how these approaches specify and manage QoS properties. We use TOSCA [5] to describe applications that is emerging standard but it also has some limitations. TOSCA supports management of containers and virtual machines but these entities are assigned only node types not applications. TOSCA specification defines only abstract policy classes that cover only sub-set of QoS properties. Neither the original policy taxonomy nor the extended ones contains all parameters required to manage wide range of policies. Currently there is no a platform independent solution to process TOSCA application descriptions and run the applications in the Cloud. Considering these limitations we addressed the following challenges:

  • how to describe and manage containerized applications with policies assigned to them,
  • how to extend the TOSCA policy hierarchy to manage wide range of QoS properties and how TOSCA policies can parametrized to support these policies, and
  • how to process and execute TOSCA specifications in a technology agnostic way.

To address these challenges we created three major contributions. First, to combine the flexibility offered by technology-oriented agnosticism with the expressiveness required to describe different properties of a large variety of applications we elaborated the Application Description Template (ADT) to specify two main aspects of applications: their architecture (application topology) and QoS properties (application policy). ADTs connect Application Developers to the application component. Each ADT contains a parameter section, a topology section with container and virtual image sub-sections, and a policies section. The first one holds the input and output parameters of the application. The topology section incorporates the container and virtual images sub-section. The policy section describes QoS parameters as TOSCA policies. As a second contribution we introduced a flexible policy hierarchy and extended the TOSCA policy hierarchy by adding a security policy with several sub-policies such as authentication, authorisation, data protection and further sub-policies to the deployment and scaling policy. We also defined a Policy Template to describe policy properties. Each template is divided into two main sections: description and properties section. The first one outlines in plain text to which service and when the policy is applied. The second one contains two types of parameters: common and specific properties. Finally, we extended the MiCADO framework with the MiCADO Submitter (Fig. 1) to process TOSCA descriptions. The ADT is submitted to the MiCADO Submitter and parsed and validated by the OpenStack TOSCA Parser and the MiCADO Validator. Next, the Mapper uses a key list to isolate information and pass it to adaptors that translate the information for the Container Orchestrator, which manages Docker containers, the Cloud Orchestrator, which handles Virtual Machines in which the containers are deployed and ran, the Policy Keeper, which manages all policies but security policies, and the Security Enforcer, which handles security policies, of the MiCADO framework.

Figure 1: MiCADO Submitter

To assess the applications descriptions and how applications are executed through the MiCADO framework, COLA tests its applicability using demonstrators and proof of concept case studies from four distinct application areas that include public sector organisations and SMEs. For example these use cases incorporate social media data analytics for local governments, simulation-based evacuation planning, data-intensive web applications, and simulation solutions for manufacturing and engineering.

This presentation will outline the MiCADO framework, the Application Description Template, the extended TOSCA policy architecture with the Policy Template and how ADTs are managed in the MiCADO framework. Further, it will present how a particular public sector organization’s application can be cost-effectively and efficiently executed through the MiCADO framework in the Cloud.

REFERENCES

  1. COLA Project – Cloud Orchestration at the Level of Application, [Online]. Available:http://www.project-cola.eu/cola-project/
  2. T. Kiss, P. Kacsuk, J. Kovacs, B. Rakoczi, A. Hajnal, A. Farkas, G. GesmierG. Terstyanszky.: MiCADO –Microservice-based Cloud Application-level Dynamic Orchestrator, Future Generation Computer Systems, 2017,https://doi.org/10.1016/j.future.2017.09.050
  3. Docker Swarm overview, [Online], Available: https://docs.docker.com/swarm/overview/
  4. Kovács J. and Kacsuk P.: Occopus: a Multi-Cloud Orchestrator to Deploy and Manage Complex Scientific Infrastructures, Journal of Grid Computing, March 2018, Volume 16, issue 1, pp 19–37
  5. OASIS: TOSCA – Simple Profile in YAML Version 1.0, [On-line], available at: http://docs.oasis-open.org/tosca/TOSCA-Simple-Profile-YAML/v1.0/csd03/TOSCA-Simple-Profile-YAML-v1.0-csd03.html

Biography:

Prof. Dr Gabor Terstyanszky is a Professor in Distributed Computing at the University of Westminster. His research interests include distributed and parallel computing, cloud, cluster and Grid computing. He supervised several European projects, such as: COPERNICUS, COST, WINPAR, HPCTI, and SEPP as local coordinator. He had a leading role in the FP7 EDGeS, DEGISCO, EDGI, SHIWA, SCI-BUS, ER-flow and H2020 CloudSME research projects. Currently ha is working on the H2020 COLA and CloudFacturing project. He published more than 130 technical papers at conferences and journals. He was member of programme committees of several conferences and workshops.

Introduction to and Demonstration of Containers in the ARDC Nectar Research Cloud

Conveners: Dr Glenn Moloney5, Wilfred Brimblecombe1

Presenters: Andy Botting2, Sam Morrison3, Jake Yip4

1Australian Research Data Commons (ARDC) (built from ANDS, Nectar, RDS), wilfred@nectar.org.au
2Australian Research Data Commons (ARDC) (built from ANDS, Nectar, RDS), andrew.botting@unimelb.edu.au
3Australian Research Data Commons (ARDC) (built from ANDS, Nectar, RDS), sam.morrison@unimelb.edu.au
4Australian Research Data Commons (ARDC) (built from ANDS, Nectar, RDS), jake.yip@unimelb.edu.au
5Australian Research Data Commons (ARDC) (built from ANDS, Nectar, RDS), glenn.moloney@nectar.org.au

GENERAL INFORMATION

  • Half day workshop
  • Include a hands-on component
  • Maximum of 40 people

DESCRIPTION

Containers provide a solution to the problem of how to get software to run reliably when moved from one computing environment to another.  This workshop provides an introduction to this popular technology by briefly going over container concepts and then demonstrating containers in use on the ARDC Nectar Research Cloud.

The workshop will have a “hands on” component so please bring you laptop.

The following topics/activities will be covered:

  1. Introduction to container concepts and products
  2. Using Docker and Kubernetes technologies on the Research Cloud
  3. Exercise using a simple tool kit that can be deployed and experimented with at the workshop and may be extended for used after the workshop
  4. Seek feedback from participants on “tuning” the container offering on the Research Cloud to meet their needs.

This workshop will provide you with a useful introduction to Container technology and help ARDC determine the container offerings that may be supported on the ARDC Nectar Research Cloud.

WHO SHOULD ATTEND

Target Audience – researchers who may benefit from a lightweight easy to use Container service or are looking for an introduction into the area.  We are assuming current sophisticated heavy users of container technology already have set up their environments and will continue to want to do so.

eResearch staff who are interested in learning about container technologies and how they can be used on the Nectar Research Cloud or across multiple cloud services.

WHAT TO BRING

Bring your laptop.  Required prerequisite knowledge – Moderate to advanced understanding of Unix and cloud environments.  If you are not from an Australian or New Zealand University you will need an AAF account to gain access for the hands-on component.


BIOGRAPHIES

Andy Botting – Senior Engineer at the Australian Research Data Commons (ARDC), Nectar Research Cloud.  I’m a cloud-native Systems Engineer with a background in Linux, HPC.  Specialities: Linux, Android, Puppet, OpenStack and AWS.

Wilfred Brimblecombe – ICT Manager at the Australian Research Data Commons (ARDC), Nectar Research Cloud, is an IT management veteran with over 20 years of leadership experience across various organisations.

Sam Morrison – Senior Engineer at the Australian Research Data Commons (ARDC), Nectar Research Cloud.  Specialties: Linux system administration, Python/Django web programming, Security, Openstack cloud technologies.

Jake Yip – DevOps Engineer at Australian Research Data Commons (ARDC), Nectar Research Cloud. Specialities: Puppet, OpenStack, Networks, DevOps and Security.

 

ARDC Nectar Research Cloud 101 for Beginners

Conveners: Dr Glenn Moloney5, Wilfred Brimblecombe1
Presenters: Andy Botting2, Sam Morrison3, Jake Yip4

1Australian Research Data Commons (ARDC) (built from ANDS, Nectar, RDS), wilfred@nectar.org.au
2Australian Research Data Commons (ARDC) (built from ANDS, Nectar, RDS), andrew.botting@unimelb.edu.au
3Australian Research Data Commons (ARDC) (built from ANDS, Nectar, RDS), sam.morrison@unimelb.edu.au
4Australian Research Data Commons (ARDC) (built from ANDS, Nectar, RDS), jake.yip@unimelb.edu.au
5Australian Research Data Commons (ARDC) (built from ANDS, Nectar, RDS), glenn.moloney@nectar.org.au

GENERAL INFORMATION

  • Half day workshop
  • Includes a hands-on component
  • Maximum of 30 people

DESCRIPTION

This workshop will provide an introduction to using the Australian Research Data Commons (ARDC) Nectar Research Cloud.  It  is for people who have no or limited experience in setting up and using Virtual Machines (VMs) in any cloud environment.

The workshop will have a “hands on” component so please bring your laptop.

The following topics/activities will be covered:

  1. What is the ARDC Nectar Research Cloud and how does it compare with commercial offerings (at technical usage level only – will not go into cost comparisons etc.) 20 minutes
  1. What is the Research Cloud good for? And what it is not good for? 10 minutes
  1. Access to the Research Cloud – How to get some of the Nectar Research Cloud. 20 minutes
    1. How to request access
    2. How to request resources
    3. How to request support to help you deploy and use your VM’s
  1. Exercise – Spin up a VM and install some software on it using Nectar Research Cloud. 120 minutes

WHO SHOULD ATTEND

Researchers or research support people who have perhaps heard of cloud computing or of the Nectar Research Cloud and would like an introduction.

WHAT TO BRING

Bring your laptop.  Required prerequisite knowledge – Moderate understanding of Unix or any other software development environment.  If you are not from an Australian or New Zealand University you will need an AAF account to gain access for the hands-on component.


BIOGRAPHIES

Andy Botting – Senior Engineer at the Australian Research Data Commons (ARDC), Nectar Research Cloud.  I’m a cloud-native Systems Engineer with a background in Linux, HPC.  Specialities: Linux, Android, Puppet, OpenStack and AWS.

Wilfred Brimblecombe – ICT Manager at the Australian Research Data Commons (ARDC), Nectar Research Cloud, is an IT management veteran with over 20 years of leadership experience across various organisations.

Sam Morrison – Senior Engineer at the Australian Research Data Commons (ARDC), Nectar Research Cloud.  Specialties: Linux system administration, Python/Django web programming, Security, Openstack cloud technologies.

Jake Yip – DevOps Engineer at Australian Research Data Commons (ARDC), Nectar Research Cloud. Specialities: Puppet, OpenStack, Networks, DevOps and Security.

 

Galaxy architecture and deployment experiences: a case study in how to build complex analysis systems for data-focussed science

Mr Simon Gladman1, Mr Derek Benson4, Dr Jeff Christiansen2, Dr Gareth Price3, A/Prof. Andrew Lonie1

1Melbourne Bioinformatics, University of Melbourne, Melbourne, Australia, simon.gladman@unimelb.edu.au
2Queensland Cyber Infrastructure Foundation, Brisbane, Australia, j.christiansen@uq.edu.au
3Queensland Facility for Advanced Bioinformatics, Brisbane, Australia, g.price@qfab.org
4Research Computing Centre, University of Queensland, Brisbane, Australia, d.benson.imb.uq.edu.au
5Melbourne Bioinformatics, University of Melbourne, Melbourne, Australia, alonie@unimelb.edu.au

GENERAL INFORMATION

  • Half day Workshop
  • includes a hands-on component
  • Up to 20 attendees

DESCRIPTION

Galaxy (https://galaxyproject.org) is a widely used, highly capable bioinformatics analysis platform. It provides users with a large library of analysis and visualization tools, reference datasets, interfaces to global databases, and evolving workflow capabilities that provide provenance and reproducibility. Users build complex analysis jobs in a highly accessible interface, which are then deployed via a scheduler to underlying computational resources. Galaxy has a relatively sophisticated approach to managing user jobs to compute resources and can, for instance, be configured to schedule jobs to disparate HPC and/or cloud resources depending on the job characteristics.

In this workshop we will explore the architecture of Galaxy Australia (http://usegalaxy.org.au), understanding how it is architected to deploy jobs from a common front end to compute resources in Queensland and Victoria. Jobs have access to a common multi-hundred-terabyte reference dataset collection that is intelligently mirrored in real time from the US-based  Galaxy Main (http://usegalaxy.org) using the CernVM file system (https://cernvm.cern.ch/portal/filesystem). We will explore the technologies, cover our experiences of how they work in practice, and discuss the ambitions of a global Galaxy infrastructure network that can leverage the efforts of a global community to maintain and support critical data and software resources.

OUTLINE OF WORKSHOP CONTENT:

  1. Overview of Galaxy. Technical overview of the componentry of Galaxy as a software platform and as a workflow generation and deployment system30 minutes
  1. Galaxy Australia architecture. Overview of the Galaxy Australia archtictural and deployment model.30 minutes
  1. Underlying technologies. Detailed exploration of the job distribution and data sharing technologies being used for Galaxy Australia.90 minutes
  1. Galaxy ‘World’ – roadmap discussion. How can multiple instances of Galaxy make use of complex, high maintenance resources including a tool library which is dependency-free and growing global reference datasets, whilst appearing as a seamless experience to non-expert users?30 minutes

WHO SHOULD ATTEND

Research infrastructure staff  interested in complex, distributed software systems and cutting edge technologies for job and data distribution.

WHAT TO BRING

A laptop, no special software required. We hope to demonstrate some of the technologies being used in Galaxy.


BIOGRAPHY

Andrew Lonie is Director of the Melbourne Bioinformatics, Director of the EMBL Australia Bioinformatics Resource (EMBL-ABR: http://embl-abr.org.au), and an associate professor at the Faculty of Medicine, Dentistry and Health Sciences at the University of Melbourne, where he coordinates the MSc (Bioinformatics). Andrew directs a group of bioinformaticians, computational biologists and HPC specialists within the Melbourne Bioinformatics and EMBL-ABR to collaborate with and support life sciences researchers in a variety of research projects across Australia.

 

Pawsey Supercomputing Centre – Engaging for the Future

Dr Neil Stringfellow1, Dr Daniel Grimwood1

1Pawsey Supercomputing Centre, Kensington, Australia

 

ABSTRACT

The Pawsey Supercomputing Centre continues to evolve and grow.  Recent developments at Pawsey include the new advanced technology testbed called Athena, as well as the expansion of the Zeus commodity Linux cluster.  The Athena testbed includes Intel Xeon Phi “Knight’s Landing” processors as well as Nvidia “Pascal” GPUs.

I will also touch on the longer term vision for Pawsey and the Federal infrastructure roadmap.

 


Biography:

Dr Neil Stringfellow is the Executive Director of the Pawsey Supercomputing Centre.

Neil has led the Pawsey Supercomputing Centre since 2013, overseeing the operational launch of of the Magnus Cray supercomputer.  Neil joined Pawsey from the Swiss National Supercomputing Centre (CSCS) where he was involved in application and science support, the management of strategic partnerships and Switzerland’s flagship supercomputing systems.

How to Improve Fault Tolerance on HPC Systems?

Dr Ahmed Shamsul Arefin1

1Scientific Computing, Information Management & Technology, Commonwealth Scientific and Industrial Research Organisation (CSIRO), Canberra, Australia

INTRODUCTION

HPC systems of today are complex systems made from hardware and software that were not necessarily designed to work together as one complete system. Therefore, in addition to regular maintenance related downtimes, hardware errors such as voltage fluctuation, temperature variation, electric breakdown, manufacturing defects as well as software malfunctions are common in supercomputers. There are several ways to achieve some fault tolerance which includes (but not limited to): checkpoint/ restart, programming model-based fault tolerance and algorithm-based theoretical solutions, but in real-life none of these cover all the situations mentioned above [1]. In this work, we have reviewed a few of the checkpoint/restart methods and we listed our experience with them including two practical solutions to the problem.

TEST DESCRIPTION

On a small cluster consisting only two nodes (master and compute) constructed using Bright Cluster Manager (BCM) ver. 7.3, we deployed a couple of existing methods (see below) that can support operating systems and software level fault tolerances. Our experience described below. In our model cluster, each node had 16 CPU cores in 2 x Intel Xeon CPU E5-2650 0 @ 2.00GHz, min 64GB RAM, 500GB local HDD, and Ethernet/ Infiniband connections for networking purposes. Slurm ver. 16.05 was installed as a part of BCM’s provisioning on the both nodes along with SLES 12Sp1 as the operating system. The master was set to act as both head and login nodes as well as the job scheduler node.

OUTCOMES

Our first test candidate was the BLCR (Berkeley Lab Checkpoint/Restart). It can allow programs running on Linux to be checkpointed i.e., written entirely to a file and then later restarted. BLCR can be useful for jobs killed unexpectedly due to power outages or exceeding run limits. It neither requires instrumentation or modification to user source code nor recompilation. We were able to install it on our test cluster and successfully checkpointing a basic Slurm job consisting only one statement: sleep 99. However, we faced the following difficulties during its pre and post-installation:

  • The BLCR website [2] lists an obsolete version of the software (ver. 0.8.5, last updated on 29 Jan 2013) which does not work with newer kernels such as what we get on the SLES12SP1. We had to collect a newer version along with a software patch from the BCLR team to get it installed on the test cluster. Slurm had to be recompiled and reconfigured several times following the BLCR installation, requiring ticket interactions with the Slurm, BCM and BLCR teams. Unfortunately, at the end the latest BLCR installation process did not work on the SLES12SP2 due to newer kernel version again (>4.4.21).
  • MPI incompatibility: The BLCR when installed along with Slurm can only work with MVAPICH2 and doesn’t support Infiniband network [2]. This MPI variant was not available in our HPC apps directory, therefore could not be tested.
  • Not available to interactive jobs: The BLCR + Slurm combination could not checkpoint/restart interactive jobs properly.
  • Not available to GPU jobs: The BLCR did not support GPU or Xeon Phi jobs.
  • Not available to licensed software: Talking to a license server, after a checkpoint/ restart did not work.

Our next candidate was the DMTCP (Distributed Multi Threaded Checkpointing). This application level tool claims to transparently checkpoint a single-host or distributed MPI computations in user-space (application level) – with no modifications to user code or to the O/S. As of today, Slurm cannot be integrated with DMTCP [2,4]. It was also noted that after a checkpoint/restart computation results can become inconsistent when compared against the non-checkpoint output, validated by a different CSIRO-IMT (User Services) team member.

Our next candidate was the CRIU (Checkpoint/Restore In Userspace). It claims to freeze a running application or at least a part of it and checkpoint to persistent storage as a collection of files. It works in user space (application level), rathe r than in the kernel. However, it provides no Slurm, MPI and Infiniband supports as of today [2,5]. We therefore have not attempted it on the test cluster.

PRACTICAL SOLUTIONS

After reviewing some of the publicly available tools that could potentially support the HPC fault tolerance, we decided propose the following practical solutions to the CSIRO HPC users.

CHECKPOINT AT THE APPLICATION LEVEL

Application level checkpointing: In this method, user’s application (based on [1]) will be set to explicitly read and write the checkpoints. Only the data needed for recovery was written down and checkpoints need taken at “good” times. However, this can result in a higher time overhead, such as several minutes to perform a single checkpoint on larger jobs (increase in program execution time) as well as user-level development time. Therefore, users may need further supports when applying this solution.

NO-CHECKPOINT, KILL-REQUEUE THE JOB

Slurm supports job preemption, by the act of stopping one or more “low-priority” jobs to let a “high-priority” job run. When a high-priority job has been allocated resources that have already been allocated to one or more low priority jobs, the low priority job(s) are pre-empted (suspended). The low priority job(s) can resume once the high priority job completes. Alternately, the low priority job(s) are killed, requeued and restarted using other resources. To validate this idea,  we’ve  deployed a  new  “killable job”  partition implementing a  “kill  and  requeue” policy.  This  solution was successfully tested by killing low priority jobs and can be further tuned to auto requeue lost jobs following a disaster, resulting an improved fault tolerance on our HPC systems.

CONCLUSION AND FUTURE WORKS

Using a test cluster, we investigated and evaluated some of the existing methods to provide a better fault tolerance on our HPC systems. Our observations suggests that the program level checkpointing would be the best way to safeguard a running code/ job, but at the cost of development times. However, we have not yet attempted a few other methods that could potentially provide an alternative solution, such as: SCR- Scalable Checkpoint/restart for MPI, FTI – Fault Tolerance Interface for hybrid systems Docker, Charm++ and so on, which remains as our future work.

REFERENCES

[1] DeBardeleben et al., SC 15 Tutorial: Practical Fault Tolerance on Today’s HPC Systems SC 2015

[2] Hargrove, P. H., & Duell, J. C. (2006). Berkeley lab checkpoint/restart (blcr) for linux clusters. In Journal of Physics: Conference Series (Vol. 46, No. 1, p. 494). IOP Publishing.

[3] Rodríguez-Pascual, M. et al. Checkpoint/restart in Slurm: current status and new developments, SLUG 2016

[4] Ansel, J., Arya, K., & Cooperman, G. (2009). DMTCP: Transparent checkpointing for cluster computations and the desktop. In IPDPS 2009. IEEE International Symposium on (pp. 1-12). IEEE.

[5] Emelyanov, P. CRIU: Checkpoint/Restore In Userspace, July 2011.


Biography:

Dr Ahmed Arefin works as an IT Advisor for HPC Systems at the Scientific Computing, IM&T, CSIRO. In the past, he was as a Specialist Engineer for the HPC systems at the University of Southern Queensland. He has done his PhD and Postdoc in the area of HPC & parallel data mining from the University of Newcastle and  published articles in PLOS ONE and Springer journals and IEEE sponsored conference proceedings. His primary research interest focuses on the application of high performance computing in data mining, graphs/networks and visualization.

https://orcid.org/0000-0002-9290-3551

ASCI, a Service Catalog for Docker

Mr John Marcou1, Robbie Clarken1, Ron Bosworth1, Andreas Moll1

1Australian Synchrotron, Melbourne, Victoria, robbie.clarken@synchrotron.org.au, john.marcou@synchrotron.org.au, ron.bosworth@synchrotron.org.au, andreas.moll@synchrotron.org.au

 

MOTIVATION

The Australian Synchrotron Computing Infrastructure (ASCI) is a platform to deliver users easy access to a remote desktop to process their experiment  data. Every experiment  station, or beamline, can start and connect to their own desktop environment with a web-browser, find their specific processing applications, and access their experiment data.

ASCI acts as a Service Catalog for Docker. It is used to start remote interactive desktops or processing services which run inside Docker containers.

KEY FEATURES

  • Remote Desktop instances
  • Web-browser based
  • User home folders
  • Desktop sharing feature between users
  • CUDA and OpenGL on NVIDIA support
  • Run multiple sessions on multiple nodes in parallel
  • Label-based scheduler to distribute the load on cluster

ARCHITECTURE

ASCI is a stateless  micro-services  solution.  Every  component  runs within  a Docker container.  The  infrastructure   is  based  on  the  CoreOS  Container  Linux  operating system, providing the main tooling to run containerized  applications.  This operating system is deployed using Terraform which is an infrastructure management tool supporting  multiple  providers,  and  allows  automated  machine  deployment.  With ASCI,  we  use  Terraform  manifests  to  deploy  CoreOS  Container  Linux  to  virtual machines on VMware vSphere cluster, or as stateless operating system on bare-metal, using CoreOS Matchbox.

THE CONTROL PLANE

The ASCI Control Plane provides two main components:

  • the ASCI Web UI contacts the ASCI API to interact with ASCI and start Desktops
  • the ASCI API contacts the Docker API on the compute nodes to schedule and manage new ASCI instance

All the proxy elements are used to route the request within the infrastructure in order to reach the Web-UI or a specific Desktop. The ASCI Admin UI is a convenient way to customize the scheduler and list the running ASCI instances.

THE DESKTO

The user connects to a web-interface which lists the environments available for creation. When a desktop is requested, the ASCI API schedules the Docker container on the cluster. When the user connects to a desktop, the ASCI API generates a one-time-password  and which is delivered to the user’s  web-browser  to establish  a VNC connection over WebSockets (noVNC).

The desktop  instance  is running  in a Docker  container.  A base image is built with standard tools and libraries, shared by  every  environment,   such  as  the  NVIDIA,  CUDA  and VirtualGL libraries, and the graphical environment (MATE).

This  image  is  use  as  parent  for  every  child  environment, which provides specialised scientific applications.

Users can store their documents in their home folder. The sharing feature allow users to share their Desktop with others.

 

 

                                                                                                                                                                             ASCI architecture

 

OPENGL SUPPORT

Supporting OpenGL is complex since the Xorg implementation doesn’t allow multiple desktops attached to the same GPU to process GLX instructions. A solution is the VirtualGL approach. Under this system there is a single GPU- attached Xorg server, called 3DX, and multiple non-GPU desktops, called 2DX. When an application started on a 2DX desktop needs to process a GLX instruction, the VirtualGL library catches and forwards the instruction to the 3DX server for processing on the GPU.

VirtualGL architecture


I
NFRASTRUCTURE SERVICES

ASCI relies on the following infrastructure services:

  • DNSmasq provides DNS resolution for the ASCI DNS sub-domain
  • BitBucket is the Git repository manager used is this environment
  • ASCI delivered applications are built as RPM packages which are stored on a local YUM Repository
  • Autobuild is an application which builds Docker image on new commit event, and push them to the Docker Registry
  • Docker Registry stores the Docker images. These images are downloaded as needed by the Docker hosts
  • Matchbox  is  the  PXE  server  to  provide  Boot-On-Lan.  It  is  configurable  via  API.  This  system  is  used  to  boot  ASCI workers on the network

 The monitoring solution is built with these components:

  • JournalBeat  and  MetricBeat  run  on  every  monitored  system  and  collect  log  and  metrics  to  send  to  the  logging database
  • HeartBeat monitors a list of services to report theirs state in the logging database
  • ElasticSearch is a search engine used to store and index logs and metrics
  • Kibana is a Web interface for ElasticSearch
  • ElastAlert, the alert manager, watches logs and metrics to trigger alerts and notifications based on rules

 


Biography:

I work at the Australian Synchrotron as DevOps/HPC engineer.

HPC at The University of Sydney: balancing user experience with system utilisation

Dr Stephen Kolmann1

1The University Of Sydney, Sydney, Australia stephen.kolmann@sydney.edu.au

The University of Sydney’s High Performance Computing cluster, called Artemis, first came online with 1344 cores of standard compute, 48 cores with high memory and 5 nodes with 2 K40 GPUs each. These resources were made available as one large resource pool, shared by all users, with job priority determined by PBS Professional’s fairshare algorithm.1 This, coupled with three-month maximum walltimes, led to high system utilisation. However, wait times were long, even for small, short jobs, resulting in sub-optimal end user experience.

To help cater for strong demand and improve the end user experience, we expanded Artemis to 4264 cores. We knew this expansion would help lower wait times, but we did not rely on this alone. In collaboration with our managed service provider (Dell Managed Services at the time, now NTT Data Services), we designed a queue structure that still caters for a heterogeneous workload, but lowers wait times for small jobs, at the expense of some system utilisation. Figure 1 shows how we partitioned compute resources on Artemis to achieve this balance.

Figure 1: Distribution of Artemis’s compute cores. The left pie chart shows the coarse division of all Artemis’s compute cores, and the right pie chart shows the nominal distribution of compute cores within the shared area of the left pie chart.

The cores are divided into three broad categories: condominiums, strategic allocations and shared cores.

  1. Condominiums are compute nodes that we manage on behalf of condominium owners
  • Strategic allocations are dedicated to research groups who won access via a competitive application process
  • Shared cores are available to any Sydney University researcher who wants Artemis access

The shared cores are further sub-divided into separate resource pools that cater for different sized jobs.  This division was made to segregate small, short jobs from large, long running jobs. The idea behind this partitioning is that short, small should start quickly, but larger, longer running jobs should be willing to tolerate longer wait times.

This poster will explore our experience with this queue structure and how it has impacted metrics such as job wait times, system utilisation and researcher adoption.

REFERENCES

1. PBS Professional Administrators Guide, p. 165. Available from:

http://www.pbsworks.com/documentation/support/PBSProAdminGuide12.pdf, accessed 1 Sep 2017.


Biography:

Stephen Kolmann is currently working at The University of Sydney as an HPC Specialist where he provides end-user HPC documentation and training and acts as a consultant for internal HPC-related projects. He completed a PhD in Computational Chemistry, where he made extensive use of HPC facilities both at The University of Sydney and NCI

12

Recent Comments

    About the conference

    eResearch Australasia provides opportunities for delegates to engage, connect, and share their ideas and exemplars concerning new information centric research capabilities, and how information and communication technologies help researchers to collaborate, collect, manage, share, process, analyse, store, find, understand and re-use information.

    Conference Managers

    Please contact the team at Conference Design with any questions regarding the conference.

    © 2017 - 2018 Conference Design Pty Ltd