Dr Gabriel Noaje1
1Nvidia, Singapore, Singapore
The MLPerf consortium mission is to “build fair and useful benchmarks” to provide an unbiased training and inference performance reference for ML hardware, software, and services. MLPerf Training v0.7 is the third instantiation for training and continues to evolve to stay on the cutting edge.
This round consists of eight different workloads that cover a broad diversity of use cases, including vision, language, recommendation, and reinforcement learning.
In MLPerf Training v0.7, the new NVIDIA A100 Tensor Core GPU and the DGX SuperPOD-based Selene supercomputer set all 16 performance records across per-chip and maxscale workloads for commercially available systems. These breakthroughs were a result of a tight integration of hardware, software, and system-level technologies.
NVIDIA engineers have developed a host of innovations to achieve these levels of performance. This presentation details many of the optimizations used to deliver the outstanding scale and performance.
Many of these improvements have been made available on NGC, which is the hub for NVIDIA GPU-optimized software. The AI community can thus realize the benefits of these optimizations in their real-world applications, not just better benchmark scores.
Dr Gabriel Noaje is a Senior Solutions Architect at NVIDIA APAC South specialized in HPC and DL. Gabriel has more than 12 years of experience in accelerator technologies and parallel computing. Prior to joining NVIDIA, Gabriel worked both for large OEMs like SGI and HPE, as well as large HPC centers in Singapore and France.
Gabriel holds a PhD in Computer Sciences from the University of Reims Champagne-Ardenne, France and a BSc and MSc in Computer Sciences from the Polytechnic University of Bucharest, Romania.