The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon India 2026 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.
Please note: This schedule is automatically displayed in India Standard Time (UTC+5:30). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change and session seating is available on a first-come, first-served basis.
Sign up or log in to add sessions to your schedule and sync them to your phone or calendar.
Running GPU inference on Kubernetes is no longer exotic — it’s becoming the default for modern AI workloads. But while teams obsess over model latency and throughput, the real problems usually hide deeper: GPU under-utilization, memory fragmentation, node-level contention, noisy neighbour, and observability gaps that make debugging feel like guesswork. In this talk, we’ll walk through a practical, field-tested monitoring approach for GPU inference workloads on Kubernetes. Attendees will learn how to instrument GPU nodes, collect and correlate GPU-specific metrics, build alerting around inference SLOs, and detect performance regressions before they disrupt production. We’ll also cover common anti-patterns and what “good” looks like for GPU observability in 2025. If you're running (or planning to run) GPU inference at scale, this session will help you monitor responsibly — and keep your cluster healthy, efficient, and fast.
Currently working as Senior SRE for Nvidia AI. In the past I have been part of SRE teams for Nvidia cloud gaming, Microsoft Azure Reliability, Adobe Analytics & VMware Cloud Services.