Testing HPC Software Stack with Virtualization

Dr Ahmed Shamsul Arefin1

1CSIRO, Canberra, Australia


In this work, we present a simple and effective virtual cluster deployment process, which can facilitate a playground for the sysadmins and help to eliminate some of the HPC software stack bugs, such as kernel/ software incompatibility.


We used the following three main software tools: VMWare, Bright Cluster Manager , Easy8 License free, SLES15 ISO, from Bright Computing and a  decommissioned compute node from our production HPC cluster with 16 CPU cores in 2 x Intel Xeon CPU E5-2650 0 @ 2.00GHz, 128GB RAM, 500GB local HDD. We started the deployment process by installing a base operating system and a virtualization tool VMWare on the physical hardware. We created the head node VM using Bright’s SLES15 ISO image and compute node VMs with pre-allocated the disk storage and MACs, but did not install any OS at this stage. Bright Cluster Manager admin portal  created the compute nodes where the head node served the OS image, IP addresses and hostnames `node [01-08]`.  Then we tested the latest Slurm and MPI, compilers and some of the commercial software compatibility against the latest OS kernel and libs. We checked the admin scripts, cron jobs and ssh keys and security and firewall features.


The virtual HPC cluster helped us to create a simple playground for testing software incompatibility issues, but not the actual HPC performance improvements . Overall, this development helped deploying a new OS image to the production, reducing bugs in the later stage and enhancing the HPC user experience.


Dr Ahmed Arefin is a Computation Scientist working within the HPC Systems Team, Scientific Computing Platforms, CSIRO. He completed his PhD in Computer Science (Data-Parallel Computing & GPUs) from the University of Newcastle, Australia and worked as a Postdoctoral Researcher (Parallel Data Mining) at the Centre for Bioinformatics, Biomarker Discovery & Information-Based Medicine (CIBM), The University of Newcastle, Australia. His research interest focuses on the application of HPC in data mining, graphs and visualization.

Recent Comments