Loading…
18-19 June
Learn More and Register to Attend

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon India 2026 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

Please note: This schedule is automatically displayed in India Standard Time (UTC+5:30)To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change and session seating is available on a first-come, first-served basis. 
Thursday June 18, 2026 10:29am - 10:39am IST
Our session introduces a cloud-native, in-house platform for modular LLM inference at enterprise scale. Built on Kubernetes, the architecture unifies open-source and vendor models via OpenAI-compatible APIs and supports distributed serving with popular inference runtimes like vLLM, SGLang, and Triton.

Powered by NVIDIA Dynamo, the system optimizes GPU fleets through intelligent scheduling, KV-aware routing, prefix caching, and NIXL-based GPU-to-GPU data transfer, further optimized by allocating fractions of a GPU using KAI-Scheduler. The platform delivers streaming, speculative decoding, quantization, and autoscaling to zero via KEDA.

We ensure comprehensive observability with Prometheus, Grafana, and ELK, all governed by GitOps principles using ArgoCD and secured with enterprise-grade practices. For end-user consumption, the platform integrates with Open WebUI via standard APIs. We’ll cover the architecture, key components, and cloudability-driven cost governance strategies that empower data science teams while accelerating safe, sustainable AI innovation across AstraZeneca.
Our session introduces a cloud-native, in-house platform for modular LLM inference at enterprise scale. Built on Kubernetes, the architecture unifies open-source and vendor models via OpenAI-compatible APIs and supports distributed serving with popular inference runtimes like vLLM, SGLang, and Triton.

Speakers
avatar for Shrinidhi

Shrinidhi

AI Platform Engineer, AstraZeneca India
AI Platform Engineer blending Kubernetes savvy with MLOps rigor. I design and run scalable, cost-aware GPU platforms for training and LLM inference—GitOps-driven, observable, and secure. Passionate about autoscaling-to-zero, fractional GPUs, and making model serving fast, reliable... Read More →
avatar for Nithin R

Nithin R

AI Platform Engineer, AstraZeneca
ML Platform Engineer | AstraZeneca

Building and scaling enterprise-grade machine learning platforms from the ground up is my passion. At AstraZeneca, I'm at the forefront of developing a robust, ML platform and LLM Inference Engine using a powerful suite of open-source technologies. This platform empowers our data scientists... Read More →
Thursday June 18, 2026 10:29am - 10:39am IST
Jasmine 2 (Level 3)
  Keynote Sessions, AI + ML
  • Content Experience Level Any

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Share Modal

Share this link via

Or copy link