Name: Keynote: Plug in and Scale: Serving LLM Models on Kubernetes Made Simple - Shrinidhi Venkataraman, AI Platform Engineer, AstraZeneca & Nithin R, AI Platform Engineer, AstraZeneca
Start: 2026-06-18T10:36:00+0530
End: 2026-06-18T10:46:00+0530

18-19 June
Learn More and Register to Attend

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon India 2026 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

Please note: This schedule is automatically displayed in India Standard Time (UTC+5:30). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change and session seating is available on a first-come, first-served basis.

Keynote: Plug in and Scale: Serving LLM Models on Kubernetes Made Simple - Shrinidhi Venkataraman, AI Platform Engineer, AstraZeneca & Nithin R, AI Platform Engineer, AstraZeneca

Thursday June 18, 2026 10:36am - 10:46am IST

Jasmine 2 (Level 3)

Our session introduces a cloud-native, in-house platform for modular LLM inference at enterprise scale. Built on Kubernetes, the architecture unifies open-source and vendor models via OpenAI-compatible APIs and supports distributed serving with popular inference runtimes like vLLM, SGLang, and Triton.

Powered by NVIDIA Dynamo, the system optimizes GPU fleets through intelligent scheduling, KV-aware routing, prefix caching, and NIXL-based GPU-to-GPU data transfer, further optimized by allocating fractions of a GPU using KAI-Scheduler. The platform delivers streaming, speculative decoding, quantization, and autoscaling to zero via KEDA.

We ensure comprehensive observability with Prometheus, Grafana, and ELK, all governed by GitOps principles using ArgoCD and secured with enterprise-grade practices. For end-user consumption, the platform integrates with Open WebUI via standard APIs. We’ll cover the architecture, key components, and cloudability-driven cost governance strategies that empower data science teams while accelerating safe, sustainable AI innovation across AstraZeneca.
Our session introduces a cloud-native, in-house platform for modular LLM inference at enterprise scale. Built on Kubernetes, the architecture unifies open-source and vendor models via OpenAI-compatible APIs and supports distributed serving with popular inference runtimes like vLLM, SGLang, and Triton.

Speakers

Shrinidhi

AI Platform Engineer, AstraZeneca India

AI Platform Engineer blending Kubernetes savvy with MLOps rigor. I design and run scalable, cost-aware GPU platforms for training and LLM inference—GitOps-driven, observable, and secure. Passionate about autoscaling-to-zero, fractional GPUs, and making model serving fast, reliable... Read More →

Nithin R

AI Platform Engineer, AstraZeneca

ML Platform Engineer | AstraZeneca

Building and scaling enterprise-grade machine learning platforms from the ground up is my passion. At AstraZeneca, I'm at the forefront of developing a robust, ML platform and LLM Inference Engine using a powerful suite of open-source technologies. This platform empowers our data scientists... Read More →

Serving and scalingV3 pdf

Thursday June 18, 2026 10:36am - 10:46am IST
Jasmine 2 (Level 3)

Keynote Sessions, AI + ML

Content Experience Level Any

KubeCon + CloudNativeCon India 2026

Shrinidhi

Nithin R

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Get help with the event