Autoscaling Riva deployment with Kubernetes for conversational AI in production

Dr Maggie Xuemeng Zhang1

1Nvidia, Melbourne, Australia

NVIDIA Riva is a GPU accelerated conversational AI framework, including automatic speech recognition (ASR), natural language understanding (NLU) and text-to-speech (TTS) capabilities to create expressive conversational AI agents.  In this talk, we will share some best practices about how to deploy Riva for conversational AI, and autoscale the number of Riva servers based on inference requests from the clients using Kubernetes. This idea can be applied to conversational AI in production on the cloud such as AWS, as well as on-prem.


Biography:

Maggie Xuemeng Zhang is a senior deep learning engineer at NVIDIA, working on deep learning frameworks and applications in computer vision and conversational AI. She received her PhD in computer science and engineering from the University of New South Wales in Australia, where she worked on GPU/CPU heterogeneous computing and compiler optimizations.

Categories