Contributing To the International HPC Certification Forum

Mr Lev Lafayette1, Dr Julian Kunkel2, Ms Weronika  Filinger3, Dr Christian Meesters4, Ms Anja Gerbes5

1University Of Melbourne
2University of Reading, Reading, United Kingdom
3University of Edinburgh, Edinburgh, United Kingdom
4Johannes Gutenberg-Universität Mainz, Mainz, Germany
5Goethe-Universität Frankfurt am Main, Frankfurt am Main, Germany

As datasets grow in size and complexity faster than personal computational devices are able to perform more researchers seek HPC systems as a solution to their computational problems. However, many researchers lack the familiarity with the environment for HPC, and require training. As the formal education curriculum has not yet responded sufficiently to this pressure, leaving HPC centres to provide basic training.

One proposed solution to this issues has been the international HPC Certification Forum, established in 2018, and developed from the Performance Conscious HPC (PeCoH) project in 2017 with the Hamburg HPC Competence Center (HHCC), which had the explicit goal of creating the broad standards for an “HPC driving license”. Since its establishment, the Forum has developed a detailed skill-tree across multiple branches (e.g., HPC Knowledge, HPC Use, Performance Engineering etc) and levels of competencies (basic, intermediate, expert) where very specific skills have particular competencies. In addition, the Forum has developed a summative examination system and a PGP-signed certificate.

Whilst the Forum separates the examination and certification from curriculum development and content delivery, it also requires a feedback mechanism from HPC education providers. Review of learning objectives and specific competencies, development of branches in depth and breadth all contribute to building a community ecosystem for the development of the Forum and its success. The availability of “HPC CF Endorsed Training”, with certifiable content is a clear avenue for HPC centres to contribute to the Forum which will be elaborated in this presentation with examples from current work.


Lev Lafayette is an HPC systems administrator and educator at the University of Melbourne, where has been for the past five years. Prior to that, he held a similar role at the University of Melbourne for eight years. He has also worked for the Ministry of Foreign Affairs (Timor-Leste) and the Parliament of Victoria, as been active in Linux community development for over fifteen years. He collects post-graduate degrees for fun and profit and is currently studying at the University of Otago (his sixth degree) and the University of London, London School of Economics (his seventh).

I’ve got a dashboard for that! The Telegraf-Influx-Grafana (TIG) stack meets HPC at Monash University

Miss Kerri Wait1

1Monash eResearch Centre, Monash University, Clayton Campus, Australia

The Telegraf-Influx-Grafana (TIG) stack is a powerful tool to explore and visualise the state of High Performance Computing (HPC) environments and their auxiliary services. Join me for a demonstration of the metrics and dashboards we’ve found most useful at the Monash eResearch Centre (MeRC).

The TIG stack consists of:

– Telegraf, plugin driven metric collection agent

– InfluxDB, time-series database

– Grafana, visualisation and dashboard web UI

Leveraging the software-defined HPC infrastructure at MeRC to deploy the telegraf agent to any number of different services is trivial. There’s no need to struggle with obscure configurations in vendor-specific web UIs. Telegraf is plugin driven and configured using a text file; enable and configure the appropriate plugins for the service in question and the telegraf agent is ready to go. The data is stored in InfluxDB, and queried using Grafana.

With TIG, I’m able to collect, store, and visualise spine switch hardware counters from the fabric, compute node health metrics like cpu and diskio, detailed jobstats for our Lustre storage, operations on our OpenLDAP servers, the utilisation of FlexLM tokens on license servers, as well as traffic on nginx and apache servers. I can monitor for specific disk usage patterns, alert particular team members via Slack, and troubleshoot user jobs.

Community maintained stacks like TIG provide rapid access to metrics, an interactive troubleshooting and exploration environment, as well as alerting and reporting functionality. They also allow you to antagonise your colleagues with the catchcry “I’ve got a dashboard for that!”


Kerri Wait is an HPC Consultant at Monash University. As an engineer, Kerri has a keen interest in pulling things apart and reassembling them in novel ways. She applies the same principles to her work in eResearch, and is passionate about making scientific research faster, more robust, and repeatable by upskilling user communities and removing entry barriers. Kerri is currently focused on monitoring and visualisation techniques for infrastructure at all levels of the Monash HPC platforms.

Spartan: From Experimental Hybrid towards a Petascale Future

Mr Lev Lafayette1, Mr Sean Crosby1, Mr Daniel Tosello1, Ms Jin Zhang1, Mr Naren Chinnam1, Mr Gregory Sauter1

1University Of Melbourne, Parkville, Australia

Previous presentations to eResearch Australiasia described the implementation of Spartan, the University of Melbourne’s general- purpose HPC system. Initially, this system was small but innovative, arguably even experimental. Features included making extensive use of cloud infrastructure for compute nodes, OpenStack for deployment, Ceph for the file system, ROCE for network, Slurm as the workload manager, EasyBuild and LMod, etc.

Based on consideration of job workload and basic principles of sunk, prospective, and opportunity costs, this combination maximised throughput on a low budget, and attracted international attention as a result. Flexibility in design also allowed the introduction of a large LIEF-supported GPGPU partition, the inclusion of older systems from Melbourne Bioinformatics, and departmental contributions. Early design decisions meant that Spartan has been able to provide performance and flexibility, and as a result continues to show high utilisation and job completion (close to 20 million), with overall metrics well what would be a “top 500” system. The inclusion of an extensive training programme based on androgogical principles has also helped significantly.

Very recently Spartan has undergone some significant architecture modifications, which this report will be of interest to other institutions. The adoption of Spectrum Scale file system has further improved scalability, performance, and reliability, along with adapting a pure HPC environment with a significant increase in core count designed for workload changes and especially queue times. Overall, these new developments in Spartan are designed to be integrated to the University’s Petascale Campus Initiative (PCI).


Lev Lafayette is an HPC systems administrator and educator at the University of Melbourne, where has been for the past five years. Prior to that, he held a similar role at the University of Melbourne for eight years. He has also worked for the Ministry of Foreign Affairs (Timor-Leste) and the Parliament of Victoria, as been active in Linux community development for over fifteen years. He collects post-graduate degrees for fun and profit and is currently studying at the University of Otago (his sixth degree) and the University of London, London School of Economics (his seventh).

Bending the Rules of Reality for Improved Collaboration and Faster Data Access

David Hiatt

WekaIO, San Jose, CA, United States, Dave@Weka.IO


The popular belief is that research data is heavy, therefore, data locality is an important factor in designing the appropriate data storage system to support research workloads. The solution is often to locate data near compute and depend on a local file system or block storage for performance. This tactic results in a compromise that severely limits the ability to scale these systems with data growth or provide shared access to data.

Advances in technology such as NVMe flash, virtualization, distributed parallel file systems, and low latency networks leverage parallelism to bend the rules of reality and provide faster than local file system performance with cloud scalability. The impact on research is to greatly simplify and reduce the cost of HPC class storage, meaning researchers spend less time waiting on results and more of their grant money goes to research rather than specialty hardware.


David Hiatt is the Director of Strategic Market Development at WekaIO, where he is responsible for developing business opportunities within the research and high-performance computing communities. Previously, Mr. Hiatt led market development activities in healthcare and life sciences at HGST’s Cloud Storage Business Unit and Violin Memory. He has been a featured speaker on data storage related topics at numerous industry events. Mr. Hiatt earned an MBA from the Booth School of Management at the University of Chicago and a BSBA from the University of Central Florida.

Gravitational Wave Astronomy in the Era of Big Data

Dr Kendall Ackley1

1Monash University/OzGrav, Murrumbeena, Australia, 


The uniquely sensitive Laser Interferometric Gravitational-Wave Observatory (LIGO) facilities have begun routinely detecting signal traces from distant massive black hole and neutron star mergers, some of which happened hundreds of millions of years ago. Representing a multi-layered data analysis problem for real-time and offline analyses, with the aid of computing clusters around the world, successful attempts to extract minute gravitational wave signatures from detector noise have become reality.

On 17 August 2017, LIGO detected its first signal from less massive objects thought to be neutron stars, reinforced by the observation of a coincident weak gamma-ray burst by the Fermi satellite.  Neither instrument has good spatial resolution, and with LIGO being an all-sky instrument, the challenges for astronomers to find the single light-emitting source amongst billions of objects in the sky that is associated with a particular event is not to be understated. Thus began a race of astronomical facilities around the world to be the first to detect the electromagnetic counterpart signal of the event.

The fact that the source was detected within hours of the first alert on the first ever occasion established and validated the field of multi-messenger gravitational wave astronomy, which had been a growing initiative, practically overnight. I will give insights into how this feat was accomplished and, as we begin to build larger and more sensitive telescopes, how we plan to manage the massive in-flux of nightly data, and how we utilise machine-learning to help us accomplish the most data-intensive tasks in an automated fashion.


Dr Kendall Ackley has been a member of the LIGO Scientific Collaboration since 2012. She joined the School of Physics and Astronomy at Monash University in 2017 as part of the ARC Centre of Excellence (OzGrav) working on identifying optical counterparts to gravitational-wave events with the Gravitational-wave Optical Transient Observer (GOTO) telescope. Her research interests include optimising follow-up studies for detecting coincident gravitational-wave and electromagnetic counterpart events, searches for gravitational waves from massive compact binaries, and utilising machine-learning algorithms to identify high-energy astrophysical transients which may accompany gravitational-wave events discovered with LIGO.

Improving Predictive Machine Learning Using Wavelet Reconstructions

Rakib Hassan1, John Wilford2

1Geoscience Australia, Canberra, Australia,

2Geoscience Australia, Canberra, Australia,


‘Uncover’ Machine Learning

Uncover Machine Learning is an initiative at Geoscience Australia to exploit recent advances in machine learning as a predictive analytics tool to support mineral exploration in Australia. Uncover-ML, a codebase developed in collaboration with CSIRO’s Data61, implements Bayesian regression models for supervised learning and leverages a suite of clustering and regression algorithms implemented in Scikit-Learn, a widely used, open-source library for machine learning.

The Uncover-ML codebase can be categorized logically into three sets of modules that comprise its machine learning pipeline: (1) Preprocessing, (2) Training and Prediction, (3) Output Generation. The Preprocessing modules implement a suite of algorithms for transforming, filtering and manipulating high resolution (~90 m), continental scale raster data sets representing e.g. topography, gravity, magnetics, etc. The Training and Prediction modules expose machine learning algorithms that consume raster and point data sets, also known as covariates and targets, respectively, during the training phase. The last leg of the pipeline takes a trained model and generates probabilistic predictions e.g. the likelihood of the occurrence of a mineral of interest at a given location. The pipeline is highly parallelized and is optimized for predictive modelling on large national datasets.

Self-similarity of geophysical datasets

Many landscape and geophysical datasets e.g. topography, drainage networks, magnetic intensity and earthquake epicenters exhibit fractal patterns (Turcotte 1992). Fractal patterns show the same statistical properties at many different scales.

Figure 1: Drainage networks as illustrate this Landsat TM 8 image are often used as an exemplar of fractal  .

However, machine learning algorithms are typically unable to exploit the self-similarity of input data sets at long wavelengths, such as the similarity of the branching patterns of the drainage system at different scales in Fig. 1. Targets and the corresponding covariate values used for training are point measurements/observations and invariably don’t take into account neighborhood relationships. We capture these neighborhood relationships by generating several multiscale versions of each covariate using 2D wavelet reconstructions (Kalbermatten et al. 2012). By including these multiscale versions of each raster in the input data, we enable machine learning algorithms to embed these relationships into a model during the training phase.

We use PyWavelets, an open-source python package, for decomposing and reconstructing raster data based on dyadic wavelet transforms, as shown in Fig 2. We apply the following steps to decompose and reconstruct each raster into progressively longer wavelength representations, while preserving their original pixel resolution, which is an essential requirement for the machine learning pipeline:

  • Compute 2D wavelet transform of raster
  1. Keep the low-pass filter coefficients and set the horizontal, vertical and diagonal high-pass filter coefficients to zero
  2. Compute 2D inverse wavelet transform based on the coefficients in step 2.

The above steps produce a Level-1 representation of the original raster, but with the spatial wavelength doubled. The same procedure can be applied again on the Level-1 raster to obtain a  representation of the original raster, but with the spatial wavelength quadrupled. These steps are repeated to produce successively longer wavelength versions of a given raster.

We have incorporated this multiscaling functionality into the Preprocessing module of Uncover-ML, which allows us to selectively apply it on continuous, non-categorical raster data. Preliminary prediction results  obtained by including multiscale rasters in the training phase show improvements compared to those from standard models. With further tests and parameter-tuning we expect further improvements in predictive mapping capabilities.


  1. Turcotte, D. L. (1992), Fractals, chaos, self‐organized criticality and tectonics. Terra Nova, 4: 4-12. doi:10.1111/j.1365-3121.1992.tb00444.x
  2. Kalbermatten, Michael, et al. “Multiscale analysis of geomorphological and geological features in high resolution digital elevation models using the wavelet transform.” Geomorphology 138.1 (2012): 352-363.
  3. Mallat, S., 2000. Une exploration des signaux en ondelettes. Paris: Les éditions del’école polytechnique.

This paper is published with the permission of the CEO, Geoscience Australia


Dr Hassan has worked as a computational software developer in both industry and academia since 2004. He obtained a bachelor in applied physics in 2003 at RMIT University, a master of geoscience at Macquarie University in 2009 and more recently, a PhD in computational geophysics at the University of Sydney in 2016.

Performance Improvements with GPUs for Marine Biodiversity: A Cross-Tasman Collaboration

Lev Lafayette1, Mitch Turnbull2, Mark Wilcox3, Eric A. Treml4

1University of Melbourne, Parkville, Australia,

2Nyriad, Cambridge, New Zealand,

3Nyriad, Cambridge, New Zealand,

4Deakin University, Geelong, Australia,


Identifying probable dispersal routes and for marine populations is a data and processing intensive task of which traditional high performance computing systems are suitable, even for single-threaded applications. Whilst processing dependencies between the datasets exist, a large level of independence between sets allows for use of job arrays to significantly improve processing  time. Identification of bottle-necks within the code base suitable for GPU optimisation however had led to additional performance improvements which can be coupled with the existing benefits from job arrays. This small example offers an example of how to optimise single-threaded applications suitable for GPU architectures for significant performance improvements. Further development is suggested with the expansion of the GPU capability of the University of Melbourne’s “Spartan” HPC system.

University of Melbourne HPC and Marine Spatial Ecology With Job Arrays

From 2011-2016, the University of Melbourne provided general researcher access to a medium-sized HPC cluster system called “Edward”, designed in a traditional fashion. As “Edward” was being retired an analysis of actual job metrics indicated that the overwhelming majority of jobs were single node or even single core, especially as job arrays.  The successor system, “Spartan”, was therefore designed more with a view of high throughput rather than high performance. A small traditional HPC system with a high-speed interconnect was partitioned from a much larger partition built on OpenStack virtual machines from the NeCTAR research cloud. This proved to a highly efficient and optimised method both in terms of finances and throughput [1].

A specific example of large number of computational tasks that are designed for single-threaded applications with modest memory requirements is that for research in the marine biodiversity and population connectivity, which has significant implications for the design of marine protected areas. In particular there is a lack of quantitative methods to incorporate, for example, larval dispersal via ocean currents, population persistence, impact on fisheries etc. The Marine Spatial Ecology and Conservation (MSEC) laboratory at the University of Melbourne has been engaging in several research projects to identify the probable dispersal routes and spatial population structure for marine species, and integrate these connectivity estimates into marine conservation planning [2].

Code Review for GPGPU Optimisation

There are a number of architectural constraints on GPUs. They are, to a very large extent, independent of their host system. Object code needs to be compiled for the GPU (e.g., using OpenCL or nvcc). There is no shared memory between the GPU and CPU and any unprocessed data must be transferred to the GPGPU environment and then back to the CPU environment when completed. This said, GPUs typically only have small amounts of cached memory, if at all, replacing the need with GPU pipelining and ensuring very high memory transfer between the GPU and the host [2].

During the first half of 2017 Nyriad reviewed the HPC infrastructure, existing MATLAB(R) source code and sample data, and wrote a test suite designed to run the CPU and GPU versions at the same time. There were two review stages; the first for optimisation of the existing MATLAB (R) code base, followed by identification of functions that could be distribution and rewritten for GPUs.

Performance Improvements

Nyriad code review identified bottlenecks that were available for GPGPU workloads. On the University of Melbourne HPC system, “Spartan”, using a single GPU, a 90x performance improvement was achieved over the original code and a 3.75x improvement over the CPU version with 12 threads available for the 4.6 GB Atlantic Model simulating 442 reefs. The simulation, previously taking 8 days to complete on one of the most powerful nodes (i.e. GPU or physical), could be completed in 2 hours. On the other hand, for the 4 MB South Africa Benguela Region dataset the GPU version is faster than the original code, but slower than the improved CPU implementation.

If the code is refactored to process reefs in parallel we anticipate that utilisation of the node would improve on a per-GPU and multi-GPU level, significantly reducing the single simulation time by fully utilising the Spartan GPU node on which it is run. With this change we predict a performance improvement of over 5x compared to the existing GPU code on meaning while using more resources on a node the execution time of a single simulation would greatly reduce. Smaller datasets would also likely achieve some improvement as per-GPU utilisation would increase. Demonstrated in Figure 2. is the performance increase of the current two versions, and the predicted performance of the multithreaded GPU version, when running a single simulation on the Atlantic data set of 442 reefs over 100 days.

Further Developments

Nyriad’s review found that there is significant opportunity in the use of data integrity and mathematical equivalence algorithmic techniques for enabling porting of code to GPUs with minimal impact to the research workflow. With notable performance improvements to a range of job profiles, a significant expansion of Spartan’s GPGPU capacity has just been implemented. The partition, funded by Linkage Infrastructure, Equipment and Facilities (LIEF) grants from the Australian Research Council is composed of 68 nodes and 272 nVidia P100 GPGPU cards The major usage of the new system will be for turbulent flows, theoretical and computational chemistry, and genomics, representative of the needs of major participants.

The University of Melbourne and Nyriad will continue their research collaborations, especially in the GPGPU environment for data integrity and mathematical equivalence, scalability testing and hybrid clusters to enable more scientific programming users to progressively scale their work up to larger systems.


  1. Lev Lafayette, Greg Sauter, Linh Vu, Bernard Meade, “Spartan : Performance and Flexibility: An HPC-Cloud Chimera”, OpenStack Summit, Barcelona, October 27, 2016
  2. For example, Keyse, J., Treml, EA., Huelsken, T., Barber, P., DeBoer, T., Kochzuis, M., Muryanto, A., Gardner, J., Liu, L., Penny, S.,  Riginos, C.  (2018),  Journal of Biogeography, February 2018
  3. Shigeyoshi Tsutsui, Pierre Collet (eds), (2013), Massively Parallel Evolutionary Computation on GPGPUs, Springer-Verlag


Lev Lafayette is the Senior HPC Support and Training Officer at the University of Melbourne, where he has been since 2015. Prior to that he worked at the Victorian Partnership for Advanced Computing in a similar role for eight years.

Fostering an organisation-wide accelerated computing strategy

Jake Carroll1

1The University of Queensland, Brisbane, Australia,



The use of accelerators (GPU, ASIC, FPGA) in research computing has become more prevalent as hardware/software ecosystems have matured. To complement this, frameworks from vendors such as nVidia and AMD have become fully-featured. As a result of a series of significant ARC/NHMRC grants – an unprecedented amount of scientific imaging infrastructure is being commissioned on the University of Queensland St Lucia campus. To leverage scientific outputs and process the data that this new infrastructure would generate, UQ procured its first tranche of accelerated computing capability in late 2017. This presentation discusses the user-engagement strategy behind UQ’s accelerated computing deployment, how it worked, why it worked and why it was a novel approach in the sector.


In late 2017, after an extensive benchmarking, analysis and design process, the Wiener supercomputer was procured to enable near real time deconvolution and deskew from imaging infrastructure, such as UQ’s new Latice Light Sheet Microscope (LLSM)[1]. This platform was the first in the Asia Pacific to feature the nVidia Volta V100 GPU and only the fourth production deployment in the world. The Wiener supercomputer was the largest investment in GPU/accelerated supercomputing that the state had ever made. The initial intention of Wiener was to provide a powerful means of deconvolution [2] to the LLSM, but it was quickly realised that with this many GPU’s connected tightly in a dedicated supercomputing deployment, the platform would serve as UQ’s launchpad for a general accelerated computing strategy.

basis of advanced computing strategy

UQ, as with several of its contemporaries has a significant investment in supercomputing. UQ’s strategy differs somewhat from its equivalent national and sister-state facilities in that it provides different pillars of supercomputing for different workloads in dedicated infrastructure.


Table 1: UQ’s Supercomputing Infrastructure load-out

Platform name Machine domain focus Workload characterisation Expected user demand Actual user demand
Tinaroo Multi-discipline MPI, tightly coupled shared memory, massively parallel High High
Awoonga Multi-discipline Loosely coupled, MPI-slack, high latency, cloud-like. Medium Medium
FlashLite Multi-discipline High throughput, high memory High Low
Wiener Multi-discipline GPU, ML, DL, CNN and imaging specific. Low High


UQ misjudged the user demand for both FlashLite and Wiener, but for different reasons, which strategic discussion in this presentation will explain and articulate.

Fostering an accelerated computing community

In the initial, as can be seen in Table 1, UQ made some assumptions about where it thought the most user demand would be, which proved incorrect. This lead to initial interest in Wiener being far more profound than first anticipated. UQ expected that Wiener would cater to a niche subset of imaging workloads, but what was unanticipated was the level of sophistication and understanding of application of convolutional neural networks, deep learning and machine learning techniques in the domain of imaging itself. An example was our overt expectation that deconvolution algorithms would run against the GPU infrastructure using codes such as Microvolution and SVI’s Huygens. The truth was, researchers were already considering using machine vision techniques and TensorFlow at scale to characterise and train image sets for more accurate detection of cells, cancers and viruses. [3]

At this point, UQ rationalised that it needed to take a more direct approach in engagement and collaboration with end users to effectively liberate the capability of this new platform. A core tenant of this was a personal and one on one approach to each workload. Whilst this is an  administrative burden, it has been demonstrated that it delivers significantly better outcomes. Thus, the general ‘onboarding’ process to Wiener, from an early point of production state became the following process:

  1. User approaches RCC with a request for compute time on accelerator based HPC.
  2. A subject matter (computer science, HPC) expert will then make an appointment to meet with the researcher or research group in order to better understand the science.
  3. A longer discussion takes place, to learn about the workload type, the potential hardware/software and computing environment impact. At this point the researcher and subject matter expert work towards a defined job-layout which is both optimal for the workload and best fit for infrastructure.

The initial consultation process generally takes between two to three hours.

UQ has empirical and measured evidence to suggest this method of personal interaction to breed a stronger capability in accelerated computing creates a far more efficient use of infrastructure, than the generally accepted process of providing a user a set of documents, readme’s and how-to instructions at a distance.


Early analysis suggests that there is a correlation between the employment of direct consultation and scientific discussion between a domain expert (in the scientific research domain) and a research computing specialist and the quality of the computational run or input in these accelerated computing platforms. This now forms the basis of the operating procedures of the Wiener supercomputing facility.


  1. UQ IMB ARC/NHMR Lattice Light Sheet Microscopy installation. Retrieved from, accessed June 8th, 2018
  2. Deconvolution Definition, Retrieved from, accessed June 8th, 2018.
  3. HPC Wiener harnessed for automating skin cancer diagnosis, Retrieved from, accessed June 8th, 2018.


Jake is currently the Associate Director of Research Computing for UQ’s three large scientifically intensive research institutes – the Australian Institute for Bioengineering and Nanotechnology, the Institute for Molecular Bioscience and the Queensland Brain Institute.

Jake has spent the last 12 years in scientific computing, working on everything from building supercomputers to managing the strategy and complexity that comes with scientific endeavour.

Jake spends his time working to make scientific computing platforms, technology and infrastructure as good as it can be, such that world class research can be conducted, unencumbered.

Jake’s background is in both computer science and business leadership – constantly fighting with himself, trying to accommodate both (very different concepts) in his working life – ultimately to try and make them work together.

High-level Cloud Application Description and Management

Gabor Terstyanszky1Gab Pierantoni2, Tamas Kiss3

1University of Westminster, London, United Kingdom,

2University of Westminster, London, United Kingdom,

3University of Westminster, London, United Kingdom,



Cloud computing has successfully and steadily addressed issues of how to run applications on complex distributed computing infrastructures. However, it must address specific deployment, scalability and security requirements. Nowadays, Infrastructure as a Service (IaaS), Platform as a Service (PaaS) and Software as a Service (SaaS) solutions are widely used in academia, business and public sector to manage applications in the Cloud. At one hand, on-demand access to the Cloud in a flexible and elastic way could result in significant cost savings due to more efficient and convenient utilization. It can also replace large investment costs with long-term operational costs. On the other hand, however, the efficient and dynamic utilization of the Cloud to run applications is not trivial. The take up of cloud computing in some application areas is still relatively low due to limited application-level flexibility and shortages in cloud specific skills. As a result, the move to the Cloud has been somehow slower and more cautious in these areas due to both application- and infrastructure-level complexity.

To enable the execution of a large variety of applications in the Cloud in a cost effective, flexible, seamless and secure way, applications must be deployed, launched, executed and removed through a framework that hides cloud specific details. To manage applications in the Cloud it needs information, such as their architecture, resources and services they need, and QoS parameters they have to meet. Application description languages can define the application architecture, specify where to deploy and run applications, how to achieve their cost effective execution, and how to provide the required security to protect data.

TOSCA-BASEd HIGH-LEVEL Application description and execution

The Cloud Orchestration at the Level of Application (COLA) project [1], funded by H2020, aims at fostering the adoption of applications to the Cloud for public sector organisations and SMEs. COLA is elaborating a generic and pluggable framework, called Microservices-based Cloud Application-level Dynamic Orchestrator (MiCADO) [2], to support the optimal and secure deployment and run-time orchestration of cloud applications. Application Developers can describe applications including their Quality of Service (QoS) parameters related to deployment (flexibility), economic viability (costs), performance (scalability) and security (data protection and privacy) and submit this description to the MiCADO framework. This framework is based on existing low-level cloud container technologies (e.g. Docker Swarm [3], management and orchestration solutions (e.g. Occopus [4]), MiCADO is generic in the sense that its services are not restricted to particular technologies and can be implemented using different existing technologies and services.

We are focusing on application description and management in the Cloud. There are three major applications description approaches: cloud platform (Amazon, Microsoft Azure, etc.) and cloud orchestration tool dependent approaches (Chef, Heat etc.); and platform independent applications description languages (Camp and TOSCA). All these approaches properly describe the applications’ architecture specifying services they are composed of and how they are connected and artefacts and resources needed to run applications. Approaches used by cloud platforms and cloud orchestration tools are not based on standards and tied to specific implementations or platforms. As a result, it’s not easy to reuse their application descriptions in heterogeneous cloud environments. There are major differences in how these approaches specify and manage QoS properties. We use TOSCA [5] to describe applications that is emerging standard but it also has some limitations. TOSCA supports management of containers and virtual machines but these entities are assigned only node types not applications. TOSCA specification defines only abstract policy classes that cover only sub-set of QoS properties. Neither the original policy taxonomy nor the extended ones contains all parameters required to manage wide range of policies. Currently there is no a platform independent solution to process TOSCA application descriptions and run the applications in the Cloud. Considering these limitations we addressed the following challenges:

  • how to describe and manage containerized applications with policies assigned to them,
  • how to extend the TOSCA policy hierarchy to manage wide range of QoS properties and how TOSCA policies can parametrized to support these policies, and
  • how to process and execute TOSCA specifications in a technology agnostic way.

To address these challenges we created three major contributions. First, to combine the flexibility offered by technology-oriented agnosticism with the expressiveness required to describe different properties of a large variety of applications we elaborated the Application Description Template (ADT) to specify two main aspects of applications: their architecture (application topology) and QoS properties (application policy). ADTs connect Application Developers to the application component. Each ADT contains a parameter section, a topology section with container and virtual image sub-sections, and a policies section. The first one holds the input and output parameters of the application. The topology section incorporates the container and virtual images sub-section. The policy section describes QoS parameters as TOSCA policies. As a second contribution we introduced a flexible policy hierarchy and extended the TOSCA policy hierarchy by adding a security policy with several sub-policies such as authentication, authorisation, data protection and further sub-policies to the deployment and scaling policy. We also defined a Policy Template to describe policy properties. Each template is divided into two main sections: description and properties section. The first one outlines in plain text to which service and when the policy is applied. The second one contains two types of parameters: common and specific properties. Finally, we extended the MiCADO framework with the MiCADO Submitter (Fig. 1) to process TOSCA descriptions. The ADT is submitted to the MiCADO Submitter and parsed and validated by the OpenStack TOSCA Parser and the MiCADO Validator. Next, the Mapper uses a key list to isolate information and pass it to adaptors that translate the information for the Container Orchestrator, which manages Docker containers, the Cloud Orchestrator, which handles Virtual Machines in which the containers are deployed and ran, the Policy Keeper, which manages all policies but security policies, and the Security Enforcer, which handles security policies, of the MiCADO framework.

Figure 1: MiCADO Submitter

To assess the applications descriptions and how applications are executed through the MiCADO framework, COLA tests its applicability using demonstrators and proof of concept case studies from four distinct application areas that include public sector organisations and SMEs. For example these use cases incorporate social media data analytics for local governments, simulation-based evacuation planning, data-intensive web applications, and simulation solutions for manufacturing and engineering.

This presentation will outline the MiCADO framework, the Application Description Template, the extended TOSCA policy architecture with the Policy Template and how ADTs are managed in the MiCADO framework. Further, it will present how a particular public sector organization’s application can be cost-effectively and efficiently executed through the MiCADO framework in the Cloud.


  1. COLA Project – Cloud Orchestration at the Level of Application, [Online]. Available:
  2. T. Kiss, P. Kacsuk, J. Kovacs, B. Rakoczi, A. Hajnal, A. Farkas, G. GesmierG. Terstyanszky.: MiCADO –Microservice-based Cloud Application-level Dynamic Orchestrator, Future Generation Computer Systems, 2017,
  3. Docker Swarm overview, [Online], Available:
  4. Kovács J. and Kacsuk P.: Occopus: a Multi-Cloud Orchestrator to Deploy and Manage Complex Scientific Infrastructures, Journal of Grid Computing, March 2018, Volume 16, issue 1, pp 19–37
  5. OASIS: TOSCA – Simple Profile in YAML Version 1.0, [On-line], available at:


Prof. Dr Gabor Terstyanszky is a Professor in Distributed Computing at the University of Westminster. His research interests include distributed and parallel computing, cloud, cluster and Grid computing. He supervised several European projects, such as: COPERNICUS, COST, WINPAR, HPCTI, and SEPP as local coordinator. He had a leading role in the FP7 EDGeS, DEGISCO, EDGI, SHIWA, SCI-BUS, ER-flow and H2020 CloudSME research projects. Currently ha is working on the H2020 COLA and CloudFacturing project. He published more than 130 technical papers at conferences and journals. He was member of programme committees of several conferences and workshops.

Introduction to and Demonstration of Containers in the ARDC Nectar Research Cloud

Conveners: Dr Glenn Moloney5, Wilfred Brimblecombe1

Presenters: Andy Botting2, Sam Morrison3, Jake Yip4

1Australian Research Data Commons (ARDC) (built from ANDS, Nectar, RDS),
2Australian Research Data Commons (ARDC) (built from ANDS, Nectar, RDS),
3Australian Research Data Commons (ARDC) (built from ANDS, Nectar, RDS),
4Australian Research Data Commons (ARDC) (built from ANDS, Nectar, RDS),
5Australian Research Data Commons (ARDC) (built from ANDS, Nectar, RDS),


  • Half day workshop
  • Include a hands-on component
  • Maximum of 40 people


Containers provide a solution to the problem of how to get software to run reliably when moved from one computing environment to another.  This workshop provides an introduction to this popular technology by briefly going over container concepts and then demonstrating containers in use on the ARDC Nectar Research Cloud.

The workshop will have a “hands on” component so please bring you laptop.

The following topics/activities will be covered:

  1. Introduction to container concepts and products
  2. Using Docker and Kubernetes technologies on the Research Cloud
  3. Exercise using a simple tool kit that can be deployed and experimented with at the workshop and may be extended for used after the workshop
  4. Seek feedback from participants on “tuning” the container offering on the Research Cloud to meet their needs.

This workshop will provide you with a useful introduction to Container technology and help ARDC determine the container offerings that may be supported on the ARDC Nectar Research Cloud.


Target Audience – researchers who may benefit from a lightweight easy to use Container service or are looking for an introduction into the area.  We are assuming current sophisticated heavy users of container technology already have set up their environments and will continue to want to do so.

eResearch staff who are interested in learning about container technologies and how they can be used on the Nectar Research Cloud or across multiple cloud services.


Bring your laptop.  Required prerequisite knowledge – Moderate to advanced understanding of Unix and cloud environments.  If you are not from an Australian or New Zealand University you will need an AAF account to gain access for the hands-on component.


Andy Botting – Senior Engineer at the Australian Research Data Commons (ARDC), Nectar Research Cloud.  I’m a cloud-native Systems Engineer with a background in Linux, HPC.  Specialities: Linux, Android, Puppet, OpenStack and AWS.

Wilfred Brimblecombe – ICT Manager at the Australian Research Data Commons (ARDC), Nectar Research Cloud, is an IT management veteran with over 20 years of leadership experience across various organisations.

Sam Morrison – Senior Engineer at the Australian Research Data Commons (ARDC), Nectar Research Cloud.  Specialties: Linux system administration, Python/Django web programming, Security, Openstack cloud technologies.

Jake Yip – DevOps Engineer at Australian Research Data Commons (ARDC), Nectar Research Cloud. Specialities: Puppet, OpenStack, Networks, DevOps and Security.


Recent Comments


    About the conference

    eResearch Australasia provides opportunities for delegates to engage, connect, and share their ideas and exemplars concerning new information centric research capabilities, and how information and communication technologies help researchers to collaborate, collect, manage, share, process, analyse, store, find, understand and re-use information.

    Conference Managers

    Please contact the team at Conference Design with any questions regarding the conference.

    © 2018 - 2020 Conference Design Pty Ltd