Name: Inference in Progress… Please Monitor Responsibly - Gaurav Sharma, NVIDIA
Start: 2026-06-19T12:00:00+0530
End: 2026-06-19T12:30:00+0530

18-19 June
Learn More and Register to Attend

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon India 2026 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

Please note: This schedule is automatically displayed in India Standard Time (UTC+5:30). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change and session seating is available on a first-come, first-served basis.

Inference in Progress… Please Monitor Responsibly - Gaurav Sharma, NVIDIA

Friday June 19, 2026 12:00pm - 12:30pm IST

205 (Level 2)

Running GPU inference on Kubernetes is no longer exotic — it’s becoming the default for modern AI workloads. But while teams obsess over model latency and throughput, the real problems usually hide deeper: GPU under-utilization, memory fragmentation, node-level contention, noisy neighbour, and observability gaps that make debugging feel like guesswork.
In this talk, we’ll walk through a practical, field-tested monitoring approach for GPU inference workloads on Kubernetes. Attendees will learn how to instrument GPU nodes, collect and correlate GPU-specific metrics, build alerting around inference SLOs, and detect performance regressions before they disrupt production. We’ll also cover common anti-patterns and what “good” looks like for GPU observability in 2025.
If you're running (or planning to run) GPU inference at scale, this session will help you monitor responsibly — and keep your cluster healthy, efficient, and fast.

Speakers

Gaurav Sharma

Engineering Manager, Nvidia

Currently working as Engineering Manager for Reliability Engineering team for Nvidia AI. In the past I have been part of SRE teams for Nvidia cloud gaming, Microsoft Azure Reliability, Adobe Analytics & VMware Cloud Services.

Gaurav Sharma Inference Observability pdf

Friday June 19, 2026 12:00pm - 12:30pm IST
205 (Level 2)

Observability

Content Experience Level Beginner

KubeCon + CloudNativeCon India 2026

Gaurav Sharma

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Get help with the event