Loading…
Open Source Summit + Embedded Linux Conference North America...
May 18-20, 2026
Minneapolis, MN
View More Details & Registration
Note: The schedule is subject to change.

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for Open Source Summit North America 2025 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

This schedule is automatically displayed in Central DaylightTime (UTC -5). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date."

IMPORTANT NOTE: Timing of sessions and room locations are subject to change.


Monday May 18, 2026 4:30pm - 5:10pm CDT
LMCache supports tiered KV caching with CPU memory offloading, extending inference beyond GPU memory limits. But what happens when local CPU memory isn't enough? This session introduces the next tier: offloading KV cache to Amazon SageMaker HyperPod managed storage, expanding cache capacity for large-scale LLM inference.

We'll cover the technical design of the SageMaker HyperPod connector contribution to LMCache. Hot entries stay in GPU memory, warm entries spill to CPU memory, and cold entries persist to HyperPod's managed storage. This three-tier architecture lets organizations cache far more context than local resources allow, reducing redundant computation for repeated prompts and long-context scenarios.

The session demonstrates the integration in action, showing cache hit rates, latency across tiers, and how the connector handles transitions between local and remote storage. We'll discuss key engineering decisions, including async prefetching and failure handling.

Attendees will leave with practical knowledge of how managed cloud storage can extend open source caching frameworks for LLM inference infrastructure.
Speakers
avatar for Yihua Cheng

Yihua Cheng

CTO, Tensormesh, Inc.
Yihua Cheng is co-founder and CTO of Tensormesh. He has a deep background in large language models, high-performance computing, and open-source development.
Yihua created LMCache and the vLLM production stack, open-source projects that have collectively earned over 9,000 GitHub... Read More →
avatar for Ziwen Ning

Ziwen Ning

Open Source Contributor
Ziwen Ning is an open-source contributor to LMCache. He was previously a Senior Software Development Engineer at AWS, working on Amazon SageMaker HyperPod with a focus on building scalable ML infrastructure. Before that at Annapurna Labs, he enhanced the AI/ML experience through the... Read More →
Monday May 18, 2026 4:30pm - 5:10pm CDT
211A+B (Level Two)
  Open AI & Data

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Share Modal

Share this link via

Or copy link