KubeCon + CloudNativeCon India 2026: Full Schedule

18-19 June
Learn More and Register to Attend

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon India 2026 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

Please note: This schedule is automatically displayed in India Standard Time (UTC+5:30). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change and session seating is available on a first-come, first-served basis.

12:40pm IST

Beyond VLLM: Distributed LLM Inferencing With Llm-d on Kubernetes - Ravindra Patil, Red Hat

Thursday June 18, 2026 12:40pm - 1:10pm IST

Jasmine 2 (Level 3)

As (LLMs) continue to grow in size and demand, single-node inferencing quickly becomes a bottleneck for performance, scalability, and cost. While vLLM has become popular for efficient LLM serving on a single node, it does not fully address the challenges of distributed inferencing across multiple GPUs and nodes in Kubernetes environments.

This talk introduces llm-d, a emerging cloud-native project designed to enable distributed LLM inferencing on Kubernetes. We will cover why vLLM gained popularity and the limitations when scaling beyond a single node. We will explore how llm-d goes a step further by enabling multi-node, multi-GPU inferencing with cloud-native primitives.

Attendees will learn how llm-d fits into modern Kubernetes platforms, how it improves scalability and resource utilization. The session focuses on practical architecture, design trade-offs, and real-world use cases rather than theory with a demo on how llm-d distributes load.

Speakers

Ravindra Patil

Principal Technical Support Engineer, Red Hat

I am AI evanlegist and working at Red Hat in AI team. I really like to learn and explore how the world can benefit from this AI revolution. I am also very keen in evaluation aspect of AI models to make sure that LLM models are Bias free and are responsible AI.

Thursday June 18, 2026 12:40pm - 1:10pm IST
Jasmine 2 (Level 3)

AI + ML

Content Experience Level Intermediate

5:30pm IST

Beyond Monolithic AI: Cloud Native Patterns for Dynamic Model Selection and Semantic Routing - Vincent Caldeira & Anindita Sinha Banerjee, Red Hat

Thursday June 18, 2026 5:30pm - 6:00pm IST

Lotus 3 (Level 3)

The era of the "one-size-fits-all" LLM is ending. We are shifting toward Compound AI Systems—complex meshes where the goal isn't just to query a model, but to dynamically select the best model for the specific task at hand. This shift creates a massive opportunity for cloud-native architectures: how do you govern non-deterministic routing at scale?

This session breaks down the infrastructure required to move from monolithic agents to multi-model orchestration. We will demonstrate how to implement Semantic Routing within an AI Gateway to act as a traffic controller, instantly analyzing user intent to route queries to the most capable (or cost-effective) model. You will learn patterns for "supervisor" workflows, where lightweight models handle routing and heavyweight models handle self-correction. Join us to discover how to build controlled AI systems on Kubernetes, ensuring your agents are not just powerful, but precise, effectively governed, and fundamentally safer.

Speakers

Vincent Caldeira

CTO APAC, Red Hat

Vincent Caldeira, Red Hat APAC CTO and Industry Visiting Scholar at Columbia University, drives tech strategy and emerging engineering. A Top 10 APAC CTO (2023) with 20+ years in finance IT, he is an authority on open source, cloud-native technologies and AI. Vincent holds leadership... Read More →

Anindita Sinha Banerjee

Data Scientist, Red Hat

With over a decade in Data and Decision Sciences, I design NLP and AI solutions that solve complex business challenges. Currently a Data Scientist at Red Hat and former researcher at Tata Research Development and Design Center, I have presented research at premier conferences and... Read More →

KubeCon India 2026 Beyond Monolithic AI Cloud Native Patterns for Dynamic Model Selection and Semantic Routing pdf

Thursday June 18, 2026 5:30pm - 6:00pm IST
Lotus 3 (Level 3)

AI + ML

Content Experience Level Intermediate

12:00pm IST

When LLMs Hit Production: Why You Need an AI Gateway - Gavrish Prabhu, Nutanix

Friday June 19, 2026 12:00pm - 12:30pm IST

Lotus 1 (Level 3)

The surge of Generative AI and large language models (LLMs) is introducing new operational challenges for Kubernetes platforms. Unlike traditional APIs, GenAI traffic is highly variable, token-driven, and cost-sensitive, requiring new approaches to routing, security, and observability.

This session, presented by the project maintainers, examines Envoy AI Gateway, a CNCF-hosted open source project, and its role as a Kubernetes-native control plane for GenAI workloads.. We’ll break down an end-to-end architecture that uses Envoy AI Gateway to manage and govern traffic across multiple LLM backends both self-hosted and cloud-based while enforcing policies such as token-aware rate limiting, authentication, and dynamic model selection.

Attendees will leave with practical insights into designing resilient, scalable GenAI platforms on Kubernetes, and an understanding of how AI-aware gateways fit into modern cloud-native infrastructure.

Speakers

Gavrish Prabhu

Technical Lead, Nutanix

Gavrish Prabhu is a Founding ML Engineer on the Nutanix Enterprise AI team with a background in distributed systems. He is active in open-source projects and is a maintainer of KServe and Envoy AI Gateway Projects. His key interests are systems involving the next generation of AI... Read More →

Friday June 19, 2026 12:00pm - 12:30pm IST
Lotus 1 (Level 3)

AI + ML

Content Experience Level Beginner

12:40pm IST

GPU Hunter: Architecting Global GPU Availability With MultiKueue - Kishore Jagannath & Ram J A, Google

Friday June 19, 2026 12:40pm - 1:10pm IST

Jasmine 2 (Level 3)

This talk tackles the global GPU shortage and a critical cloud-native reality in batch workload scheduling: "Kueue Quota reserved" does not equal "Capacity available". For production inference, this distinction determines whether workloads succeed or remain stuck indefinitely.

The session explores transcending regional silos using MultiKueue to manage a global federation of worker clusters. Moving beyond basic setup, the speakers address the "Quota vs. Datacenter Capacity" dilemma. They reveal a critical gap discovered through experiments: workloads becoming stranded in regions with exhausted capacity despite available quotas.

The speakers share their collaboration with the community to resolve these scheduling gaps (Issue #8089). This talk demonstrates the configuration aspects of Admission Checks and provisioning classes in Multikueue to force the scheduler to "hunt" for actual capacity in worker clusters across regions rather than relying solely on user-provided quota.

Speakers

Kishore Jagannath

Cloud Engineer, Google

Kishore Jagannath serves as a Cloud Solutions Engineer at Google, where he focuses on cloud infrastructure, large-scale Kubernetes orchestration and AI Platform Infrastructure. Over the past year, he has been architecting global compute planes to address GPU scarcity for production-critical... Read More →

Ram J A

Solutions Architect, Google

Ram J A is an engineer who enjoys learning new things and using technology to solve problems. Their work primarily focuses on Cloud computing and Kubernetes, with a recent interest in AI Infrastructure and building LLM-based agents. Outside of work, Ram spends time catching up on... Read More →

KubeCon India 2026 pdf

KubeCon India 2026 pptx

Friday June 19, 2026 12:40pm - 1:10pm IST
Jasmine 2 (Level 3)

AI + ML

Content Experience Level Intermediate

12:40pm IST

When Kubeflow Fights Cilium: Debugging 60% Idle GPUs in Kubernetes - Ramkumar Nagaraj & Bingi Narasimha Karthik, Adobe

Friday June 19, 2026 12:40pm - 1:10pm IST

Lotus 1 (Level 3)

We built a research testbed to validate ML workload scalability on Kubernetes with Kubeflow and Cilium. During 500-node stress tests, GPUs sat idle 60% of the time while pods waited for available resources.

Through controlled experiments, we isolated the cause: Kubeflow's pipeline scheduler and Cilium's network-aware pod placement make conflicting decisions. Kubeflow schedules pods without considering network topology. Cilium optimizes networking but can't move scheduled pods. Result? GPUs unused while the scheduler searches for placement that won't happen.

This talk shares our systematic investigation, diagnostic methodology, and scheduling constraints that resolved it. Lab tests show GPU utilization improved from 40% to 85%. You'll see the problem reproduced live, understand why it's hard to detect, and get tested Kubernetes configs. This matters for anyone planning distributed training with Kubeflow or similar orchestrators on network-optimized clusters.

Speakers

Bingi Narasimha Karthik

Senior Cloud Engineer, Adobe

Bingi, a Senior Cloud Engineer at Adobe, is certified in CKA, CKAD, KCNA, PCA, AWS Certified Solutions Architect - Associate, CSPO, and Machine Learning. He excels in simplifying Kubernetes metrics, transforming data into actionable insights. His innovative namespace metric delivery... Read More →

Ramkumar Nagaraj

Sr Computer Scientist, Adobe

Currently I am working in Adobe Systems Pvt Ltd as a Senior Computer Scientist. Claimed Golden Kubestronaut badge.

kubecon india 2026 kubeflow cilium pptx

Friday June 19, 2026 12:40pm - 1:10pm IST
Lotus 1 (Level 3)

AI + ML

Content Experience Level Intermediate

2:30pm IST

Run Your Own AI Cluster on a DGX Spark: Kubernetes, GPUs, and DRA - Janakiram MSV, Janakiram & Associates & Shreyas Mocherla, Nirmata

Friday June 19, 2026 2:30pm - 3:00pm IST

Jasmine 2 (Level 3)

AI engineers often choose between laptops that cannot keep up and expensive shared cloud clusters, which slows iteration. This session shows how to turn a single NVIDIA DGX Spark into a personal AI cluster using Kubernetes, GPU tooling, and Dynamic Resource Allocation.
Starting from a minimal single-node cluster, we add GPU enablement, an AI app stack, and DRA-based GPU sharing. You will learn how the NVIDIA GPU Operator and DRA drivers expose GPU capabilities, how ResourceClasses and ResourceClaims work, and how to enable multiple workloads to share a single device with predictable behavior. A live demo runs an interactive chat or multimodal app beside a background job on the same GPU, using different DRA policies and observing enforcement at runtime.
You leave with a clear mental model, reusable YAML, and patterns that remain portable from a desk-side Spark to multi-node cloud clusters, plus basics for utilization visibility, capacity planning, and multi-tenancy.

Speakers

Janakiram MSV

Principal Analyst, Janakiram & Associates

https://janakiram.com/profile

Shreyas Mocherla

Software Engineer, Nirmata

Shreyas Mocherla is a Software Engineer at Nirmata working on AI Platform Engineering. He built the Kyverno MCP Server, one of the early Model Context Protocol implementations for Kubernetes, and is a CNCF Kubestronaut, among the youngest globally to earn the distinction. Shrey specializes... Read More →

Friday June 19, 2026 2:30pm - 3:00pm IST
Jasmine 2 (Level 3)

AI + ML

Content Experience Level Intermediate

4:10pm IST

So You Want To Run AI Agents on Kubernetes: A 101 Guide - Rajas Kakodkar, Broadcom

Friday June 19, 2026 4:10pm - 4:40pm IST

Jasmine 2 (Level 3)

Your platform team is trying to integrate AI workloads, your developers want to deploy agents and your leadership expects 10x efficiency with AI but when you dig into the specifics, the terminology becomes a maze: AI Agents, MCP Servers, and Dynamic Resource Allocation (DRA). Where do you even start?

This session is centered around what problem each of these solve. In this hands-on 101 guide I will start by demystifying how AI agents operate on Kubernetes—how they interact with workloads, the role of MCP servers in enabling multi-agent coordination, and the fundamentals of DRA for intelligent resource management. Then, I will bring it all together with a live demo showing how AI agents can dynamically tune DRA drivers to optimize scheduling and resource usage in real time. Whether you’re an engineer, researcher, or just Kubernetes-curious, this session will equip you with the foundational knowledge and confidence to start experimenting with agentic and adaptive systems on Kubernetes.

Speakers

Rajas Kakodkar

Software Engineer, Broadcom

Rajas is a staff software engineer at Broadcom, where he focuses on low level functions of Kubernetes nodes. He is a tech lead of the CNCF Technical Advisory Group, Workload Foundation and a Kubernetes contributor. He has been co-chairing Cloud Native AI Day, a co-located event at... Read More →

Friday June 19, 2026 4:10pm - 4:40pm IST
Jasmine 2 (Level 3)

AI + ML

Content Experience Level Beginner

4:50pm IST

LLMs Behind Bars: Sandboxes at Scale for AI on a Short Leash - Prashanth Pai, CodeRabbit

Friday June 19, 2026 4:50pm - 5:20pm IST

205 (Level 2)

LLMs can write code - and sometimes running that code is the most direct way to deliver product value. The moment you do, you’ve effectively introduced a remote-code-execution surface: the code is untrusted by default, but the system still has to execute it to stay useful.

In this talk, we’ll share what it took to build and operate production sandboxes for LLM-generated code at scale. We’ll cover the isolation model (containers, least-privilege defaults, syscall/filesystem restrictions), the operational reality (startup latency, resource limits, cold starts, observability), and the guardrails that matters when code or users try to misbehave. We’ll also dig into data protection: locking down egress, blocking exfiltration paths, and keeping secrets out of reach.

We’ll cover what worked, what failed, and what we’d do differently - ending with a practical, vendor-agnostic mental model and checklist you can apply.

Speakers

Prashanth Pai

Principal Engineer, CodeRabbit

Prashanth Pai is a Principal Engineer at CodeRabbit, where he builds the infrastructure that powers safe, reliable execution for AI products in production.

He started his career at Red Hat and has been passionate about open source ever since.

LLMs Behind Bars Sandboxes at Scale pdf

Friday June 19, 2026 4:50pm - 5:20pm IST
205 (Level 2)

AI + ML

Content Experience Level Beginner

4:50pm IST

The Hidden Cost of ML Data Lifecycles in Kubernetes - Yashasvi Misra, Pure Storage

Friday June 19, 2026 4:50pm - 5:20pm IST

Jasmine 2 (Level 3)

As ML workloads move onto Kubernetes, many teams unintentionally turn their clusters into data platforms storing training data, features, and intermediate artifacts alongside compute. While convenient at first, this approach introduces hidden costs that surface over time.

This talk shares real-world lessons from operating ML pipelines on Kubernetes where data management, not models, became the primary source of failures. We’ll explore common anti-patterns involving PVCs, object storage mounts, and ephemeral volumes, and how they led to rising costs, broken reproducibility, and pipelines training on stale or incorrect data. Finally, we’ll discuss practical cloud-native patterns for managing ML data that respect data lifecycles, improve lineage, and keep Kubernetes focused on what it does best.

Speakers

Yashasvi Misra

Software Engineer, Everpure

Yashasvi Misra is a Software Engineer at Pure Storage and Chair of the NumFOCUS Code of Conduct Working Group. She has contributed to foundational projects like NumPy & Kubernetes and has been an active part of the Python community since her college days. Yashasvi is also a passionate... Read More →

KubeCon India ML Data Lifecycles pdf

Friday June 19, 2026 4:50pm - 5:20pm IST
Jasmine 2 (Level 3)

AI + ML

Content Experience Level Intermediate

12:40pm IST

5:30pm IST

12:00pm IST

12:40pm IST

12:40pm IST

2:30pm IST

4:10pm IST

4:50pm IST

4:50pm IST

Get help with the event