Loading…
18-19 June
Learn More and Register to Attend

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon India 2026 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

Please note: This schedule is automatically displayed in India Standard Time (UTC+5:30)To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change and session seating is available on a first-come, first-served basis. 
Type: AI + ML clear filter
Thursday, June 18
 

12:40pm IST

Beyond VLLM: Distributed LLM Inferencing With Llm-d on Kubernetes - Ravindra Patil, Red Hat
Thursday June 18, 2026 12:40pm - 1:10pm IST
As (LLMs) continue to grow in size and demand, single-node inferencing quickly becomes a bottleneck for performance, scalability, and cost. While vLLM has become popular for efficient LLM serving on a single node, it does not fully address the challenges of distributed inferencing across multiple GPUs and nodes in Kubernetes environments.

This talk introduces llm-d, a emerging cloud-native project designed to enable distributed LLM inferencing on Kubernetes. We will cover why vLLM gained popularity and the limitations when scaling beyond a single node. We will explore how llm-d goes a step further by enabling multi-node, multi-GPU inferencing with cloud-native primitives.

Attendees will learn how llm-d fits into modern Kubernetes platforms, how it improves scalability and resource utilization. The session focuses on practical architecture, design trade-offs, and real-world use cases rather than theory with a demo on how llm-d distributes load.
Speakers
avatar for Ravindra Patil

Ravindra Patil

Principal Technical Support Engineer, Red Hat
I am AI evanlegist and working at Red Hat in AI team. I really like to learn and explore how the world can benefit from this AI revolution. I am also very keen in evaluation aspect of AI models to make sure that LLM models are Bias free and are responsible AI.
Thursday June 18, 2026 12:40pm - 1:10pm IST
Jasmine 2 (Level 3)
  AI + ML

5:30pm IST

Beyond Monolithic AI: Cloud Native Patterns for Dynamic Model Selection and Semantic Routing - Vincent Caldeira & Anindita Sinha Banerjee, Red Hat
Thursday June 18, 2026 5:30pm - 6:00pm IST
The era of the "one-size-fits-all" LLM is ending. We are shifting toward Compound AI Systems—complex meshes where the goal isn't just to query a model, but to dynamically select the best model for the specific task at hand. This shift creates a massive opportunity for cloud-native architectures: how do you govern non-deterministic routing at scale?

This session breaks down the infrastructure required to move from monolithic agents to multi-model orchestration. We will demonstrate how to implement Semantic Routing within an AI Gateway to act as a traffic controller, instantly analyzing user intent to route queries to the most capable (or cost-effective) model. You will learn patterns for "supervisor" workflows, where lightweight models handle routing and heavyweight models handle self-correction. Join us to discover how to build controlled AI systems on Kubernetes, ensuring your agents are not just powerful, but precise, effectively governed, and fundamentally safer.
Speakers
avatar for Vincent Caldeira

Vincent Caldeira

CTO APAC, Red Hat
Vincent Caldeira, Red Hat APAC CTO and Industry Visiting Scholar at Columbia University, drives tech strategy and emerging engineering. A Top 10 APAC CTO (2023) with 20+ years in finance IT, he is an authority on open source, cloud-native technologies and AI. Vincent holds leadership... Read More →
avatar for Anindita Sinha Banerjee

Anindita Sinha Banerjee

Data Scientist, Red Hat
With over a decade in Data and Decision Sciences, I design NLP and AI solutions that solve complex business challenges. Currently a Data Scientist at Red Hat and former researcher at Tata Research Development and Design Center, I have presented research at premier conferences and... Read More →
Thursday June 18, 2026 5:30pm - 6:00pm IST
Lotus 3 (Level 3)
  AI + ML
 
Friday, June 19
 

12:00pm IST

When LLMs Hit Production: Why You Need an AI Gateway - Gavrish Prabhu, Nutanix
Friday June 19, 2026 12:00pm - 12:30pm IST
The surge of Generative AI and large language models (LLMs) is introducing new operational challenges for Kubernetes platforms. Unlike traditional APIs, GenAI traffic is highly variable, token-driven, and cost-sensitive, requiring new approaches to routing, security, and observability.

This session, presented by the project maintainers, examines Envoy AI Gateway, a CNCF-hosted open source project, and its role as a Kubernetes-native control plane for GenAI workloads.. We’ll break down an end-to-end architecture that uses Envoy AI Gateway to manage and govern traffic across multiple LLM backends both self-hosted and cloud-based while enforcing policies such as token-aware rate limiting, authentication, and dynamic model selection.

Attendees will leave with practical insights into designing resilient, scalable GenAI platforms on Kubernetes, and an understanding of how AI-aware gateways fit into modern cloud-native infrastructure.
Speakers
avatar for Gavrish Prabhu

Gavrish Prabhu

Technical Lead, Nutanix
Gavrish Prabhu is a Founding ML Engineer on the Nutanix Enterprise AI team with a background in distributed systems. He is active in open-source projects and is a maintainer of KServe and Envoy AI Gateway Projects. His key interests are systems involving the next generation of AI... Read More →
Friday June 19, 2026 12:00pm - 12:30pm IST
Lotus 1 (Level 3)
  AI + ML

12:40pm IST

GPU Hunter: Architecting Global GPU Availability With MultiKueue - Kishore Jagannath & Ram J A, Google
Friday June 19, 2026 12:40pm - 1:10pm IST
This talk tackles the global GPU shortage and a critical cloud-native reality in batch workload scheduling: "Kueue Quota reserved" does not equal "Capacity available". For production inference, this distinction determines whether workloads succeed or remain stuck indefinitely.

The session explores transcending regional silos using MultiKueue to manage a global federation of worker clusters. Moving beyond basic setup, the speakers address the "Quota vs. Datacenter Capacity" dilemma. They reveal a critical gap discovered through experiments: workloads becoming stranded in regions with exhausted capacity despite available quotas.

The speakers share their collaboration with the community to resolve these scheduling gaps (Issue #8089). This talk demonstrates the configuration aspects of Admission Checks and provisioning classes in Multikueue to force the scheduler to "hunt" for actual capacity in worker clusters across regions rather than relying solely on user-provided quota.
Speakers
avatar for Kishore Jagannath

Kishore Jagannath

Cloud Engineer, Google
Kishore Jagannath serves as a Cloud Solutions Engineer at Google, where he focuses on cloud infrastructure, large-scale Kubernetes orchestration and AI Platform Infrastructure. Over the past year, he has been architecting global compute planes to address GPU scarcity for production-critical... Read More →
avatar for Ram J A

Ram J A

Solutions Architect, Google
Ram J A is an engineer who enjoys learning new things and using technology to solve problems. Their work primarily focuses on Cloud computing and Kubernetes, with a recent interest in AI Infrastructure and building LLM-based agents. Outside of work, Ram spends time catching up on... Read More →
Friday June 19, 2026 12:40pm - 1:10pm IST
Jasmine 2 (Level 3)
  AI + ML

12:40pm IST

When Kubeflow Fights Cilium: Debugging 60% Idle GPUs in Kubernetes - Ramkumar Nagaraj & Bingi Narasimha Karthik, Adobe
Friday June 19, 2026 12:40pm - 1:10pm IST
We built a research testbed to validate ML workload scalability on Kubernetes with Kubeflow and Cilium. During 500-node stress tests, GPUs sat idle 60% of the time while pods waited for available resources.

Through controlled experiments, we isolated the cause: Kubeflow's pipeline scheduler and Cilium's network-aware pod placement make conflicting decisions. Kubeflow schedules pods without considering network topology. Cilium optimizes networking but can't move scheduled pods. Result? GPUs unused while the scheduler searches for placement that won't happen.

This talk shares our systematic investigation, diagnostic methodology, and scheduling constraints that resolved it. Lab tests show GPU utilization improved from 40% to 85%. You'll see the problem reproduced live, understand why it's hard to detect, and get tested Kubernetes configs. This matters for anyone planning distributed training with Kubeflow or similar orchestrators on network-optimized clusters.
Speakers
avatar for Bingi Narasimha Karthik

Bingi Narasimha Karthik

Senior Cloud Engineer, Adobe
Bingi, a Senior Cloud Engineer at Adobe, is certified in CKA, CKAD, KCNA, PCA, AWS Certified Solutions Architect - Associate, CSPO, and Machine Learning. He excels in simplifying Kubernetes metrics, transforming data into actionable insights. His innovative namespace metric delivery... Read More →
avatar for Ramkumar Nagaraj

Ramkumar Nagaraj

Sr Computer Scientist, Adobe
Currently I am working in Adobe Systems Pvt Ltd as a Senior Computer Scientist. Claimed Golden Kubestronaut badge.
Friday June 19, 2026 12:40pm - 1:10pm IST
Lotus 1 (Level 3)
  AI + ML

2:30pm IST

Run Your Own AI Cluster on a DGX Spark: Kubernetes, GPUs, and DRA - Janakiram MSV, Janakiram & Associates & Shreyas Mocherla, Nirmata
Friday June 19, 2026 2:30pm - 3:00pm IST
AI engineers often choose between laptops that cannot keep up and expensive shared cloud clusters, which slows iteration. This session shows how to turn a single NVIDIA DGX Spark into a personal AI cluster using Kubernetes, GPU tooling, and Dynamic Resource Allocation.
Starting from a minimal single-node cluster, we add GPU enablement, an AI app stack, and DRA-based GPU sharing. You will learn how the NVIDIA GPU Operator and DRA drivers expose GPU capabilities, how ResourceClasses and ResourceClaims work, and how to enable multiple workloads to share a single device with predictable behavior. A live demo runs an interactive chat or multimodal app beside a background job on the same GPU, using different DRA policies and observing enforcement at runtime.
You leave with a clear mental model, reusable YAML, and patterns that remain portable from a desk-side Spark to multi-node cloud clusters, plus basics for utilization visibility, capacity planning, and multi-tenancy.
Speakers
avatar for Janakiram MSV

Janakiram MSV

Principal Analyst, Janakiram & Associates
https://janakiram.com/profile
avatar for Shreyas Mocherla

Shreyas Mocherla

Software Engineer, Nirmata
Shreyas Mocherla is a Software Engineer at Nirmata working on AI Platform Engineering. He built the Kyverno MCP Server, one of the early Model Context Protocol implementations for Kubernetes, and is a CNCF Kubestronaut, among the youngest globally to earn the distinction. Shrey specializes... Read More →
Friday June 19, 2026 2:30pm - 3:00pm IST
Jasmine 2 (Level 3)
  AI + ML

4:10pm IST

So You Want To Run AI Agents on Kubernetes: A 101 Guide - Rajas Kakodkar, Broadcom
Friday June 19, 2026 4:10pm - 4:40pm IST
Your platform team is trying to integrate AI workloads, your developers want to deploy agents and your leadership expects 10x efficiency with AI but when you dig into the specifics, the terminology becomes a maze: AI Agents, MCP Servers, and Dynamic Resource Allocation (DRA). Where do you even start?

This session is centered around what problem each of these solve. In this hands-on 101 guide I will start by demystifying how AI agents operate on Kubernetes—how they interact with workloads, the role of MCP servers in enabling multi-agent coordination, and the fundamentals of DRA for intelligent resource management. Then, I will bring it all together with a live demo showing how AI agents can dynamically tune DRA drivers to optimize scheduling and resource usage in real time. Whether you’re an engineer, researcher, or just Kubernetes-curious, this session will equip you with the foundational knowledge and confidence to start experimenting with agentic and adaptive systems on Kubernetes.
Speakers
avatar for Rajas Kakodkar

Rajas Kakodkar

Software Engineer, Broadcom
Rajas is a staff software engineer at Broadcom, where he focuses on low level functions of Kubernetes nodes. He is a tech lead of the CNCF Technical Advisory Group, Workload Foundation and a Kubernetes contributor. He has been co-chairing Cloud Native AI Day, a co-located event at... Read More →
Friday June 19, 2026 4:10pm - 4:40pm IST
Jasmine 2 (Level 3)
  AI + ML

4:50pm IST

LLMs Behind Bars: Sandboxes at Scale for AI on a Short Leash - Prashanth Pai, CodeRabbit
Friday June 19, 2026 4:50pm - 5:20pm IST
LLMs can write code - and sometimes running that code is the most direct way to deliver product value. The moment you do, you’ve effectively introduced a remote-code-execution surface: the code is untrusted by default, but the system still has to execute it to stay useful.

In this talk, we’ll share what it took to build and operate production sandboxes for LLM-generated code at scale. We’ll cover the isolation model (containers, least-privilege defaults, syscall/filesystem restrictions), the operational reality (startup latency, resource limits, cold starts, observability), and the guardrails that matters when code or users try to misbehave. We’ll also dig into data protection: locking down egress, blocking exfiltration paths, and keeping secrets out of reach.

We’ll cover what worked, what failed, and what we’d do differently - ending with a practical, vendor-agnostic mental model and checklist you can apply.
Speakers
avatar for Prashanth Pai

Prashanth Pai

Principal Engineer, CodeRabbit
Prashanth Pai is a Principal Engineer at CodeRabbit, where he builds the infrastructure that powers safe, reliable execution for AI products in production.

He started his career at Red Hat and has been passionate about open source ever since.
Friday June 19, 2026 4:50pm - 5:20pm IST
205 (Level 2)
  AI + ML

4:50pm IST

The Hidden Cost of ML Data Lifecycles in Kubernetes - Yashasvi Misra, Pure Storage
Friday June 19, 2026 4:50pm - 5:20pm IST
As ML workloads move onto Kubernetes, many teams unintentionally turn their clusters into data platforms storing training data, features, and intermediate artifacts alongside compute. While convenient at first, this approach introduces hidden costs that surface over time.

This talk shares real-world lessons from operating ML pipelines on Kubernetes where data management, not models, became the primary source of failures. We’ll explore common anti-patterns involving PVCs, object storage mounts, and ephemeral volumes, and how they led to rising costs, broken reproducibility, and pipelines training on stale or incorrect data. Finally, we’ll discuss practical cloud-native patterns for managing ML data that respect data lifecycles, improve lineage, and keep Kubernetes focused on what it does best.
Speakers
avatar for Yashasvi Misra

Yashasvi Misra

Software Engineer, Pure Storage
Yashasvi Misra is a Software Engineer at Pure Storage and Chair of the NumFOCUS Code of Conduct Working Group. She has contributed to foundational projects like NumPy & Kubernetes and has been an active part of the Python community since her college days.

Yashasvi is also a passionate advocate for diversity and inclusion in tech. She has shared her work and insights at conferences around the world, including PyCon India, PyCon Europe, PyLadiesCon, and PyData Global... Read More →
Friday June 19, 2026 4:50pm - 5:20pm IST
Jasmine 2 (Level 3)
  AI + ML
 
  • Filter By Date
  • Filter By Venue
  • Filter By Type
  • Content Experience Level
  • Timezone

Share Modal

Share this link via

Or copy link

Filter sessions
Apply filters to sessions.