Securely Share Your Bare-Metal HPC and AI Cluster without Virtualisation: Modern Multi-tenancy Solutions

Dr Werner Scholz1

1XENON Systems, Springvale, Australia

Biography:

Dr. Werner Scholz is CTO and Head of R&D at XENON Systems, a consultancy, solutions and services provider for High Performance Computing, Deep Learning, Artificial Intelligence, data storage and data management solutions.

Werner is leading a team of dedicated, experienced, and highly skilled solutions architects and engineers working on systems for some of the largest supercomputing, AI, cloud, and research institutes in Australia and the APAC region.

Werner graduated with Master's and PhD degrees in Physics from the Vienna Univ. of Technology, Austria, where he developed a widely used MPI parallel open-source simulation package for magnetic materials.

Before joining XENON Systems, Werner led a team of engineers at Seagate Technology in the US where he developed Heat Assisted Magnetic Recording technologies for next generation hard disk drives and managed Seagate's HPC infrastructure.

Abstract:

Compute clusters provide efficient solutions for High Performance Computing (HPC) workloads, Artificial Intelligence (AI), and cloud services. The largest supercomputers, AI platforms, and cloud services in the world are all designed as clusters with thousands of compute nodes, fast networks, and centralised management, which allows relatively small teams of engineers to manage and maintain them. These clusters are designed as multi-user systems for hundreds of users and thousands of concurrent jobs. However, securely separating users/projects/tenants and workloads and accommodating workloads of different security classification is a challenge. Typically, this is solved using virtualisation solutions, but they incur significant overheads, performance penalties, and configuration challenges.

In this presentation, we will discuss alternative multi-tenancy solutions for bare-metal HPC and AI Clusters which provide the security of hard network segregation while maintaining full bare-metal performance. In addition to the compute nodes and Ethernet and Infiniband network fabrics, appropriate storage systems also support multiple tenants, which results in complete compute-network-storage cluster infrastructures, which can be securely shared between different teams and organisations while maintaining strict separation of the tenants.

 

 

Categories