Loading…
Open Source Summit + Embedded Linux Conference North America...
May 18-20, 2026
Minneapolis, MN
View More Details & Registration
Note: The schedule is subject to change.

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for Open Source Summit North America 2025 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

This schedule is automatically displayed in Central DaylightTime (UTC -5). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date."

IMPORTANT NOTE: Timing of sessions and room locations are subject to change.


Venue: 211A+B (Level Two) clear filter
arrow_back View All Dates
Wednesday, May 20
 

11:00am CDT

KV-Cache Centric Inference: Building an Open Source LLM Serving Platform Around State - Martin Hickey, IBM Research
Wednesday May 20, 2026 11:00am - 11:40am CDT
We optimize LLM inference around compute—faster kernels, better batching, smarter parallelism. But in production, the real bottleneck is state. The KV‑cache holds precomputed attention data that turns a multi‑second prefill into a sub‑second cache hit. Lose it to eviction, isolate it on one node, or route away from it, and you pay the full compute cost again for work you already did.

llm-d is an open-source distributed inference platform, co-founded by Google, IBM Research, Red Hat, NVIDIA, and CoreWeave, that treats the KV‑cache as the core of the system rather than a byproduct. That enables tiered memory management—offloading KV blocks from GPU to CPU to shared storage—cross‑replica reuse so cached state computed anywhere is usable everywhere, and cache‑aware scheduling that routes requests to the replica most likely to hold their prefix.

This session walks through how llm-d and vLLM implement each layer of this stack, how they combine into a production system, and what the open‑source community can build on top. We’ll share benchmarks, Kubernetes deployment patterns, and practical guidance for operators running LLM workloads at scale.
Speakers
avatar for Martin Hickey

Martin Hickey

Senior Technical Staff Member, IBM Research
Martin Hickey is a STSM at IBM Research, focused on Open Source, Cloud Native Computing, and AI. Martin has notable contributions to open source projects like vLLM, LMCache, Kubernetes, Helm, OpenTelemetry and OpenStack. Martin is a core maintainer for LMCache and an emeritus core... Read More →
Wednesday May 20, 2026 11:00am - 11:40am CDT
211A+B (Level Two)
  Open AI & Data
  • Audience Experience Level Any
 
  • Filter By Date
  • Filter By Venue
  • Filter By Type
  • Audience Experience Level
  • Timezone

Share Modal

Share this link via

Or copy link

Filter sessions
Apply filters to sessions.
Filtered by Date -