KV Block Manager — NVIDIA Dynamo Documentation
Title: KV Block Manager — NVIDIA Dynamo Documentation
URL Source: https://docs.nvidia.com/dynamo/archive/0.5.1/architecture/kvbm_intro.html?userAgent=PromptingBot%2F1.0.0
Published Time: Tue, 14 Oct 2025 16:26:52 GMT
Markdown Content: Skip to main content
Back to top- [x] - [x]
Ctrl+K
Search Ctrl+K
Search Ctrl+K
Table of Contents
Getting Started
Kubernetes Deployment
- Quickstart (K8s)
- Detailed Installation Guide
- Dynamo Operator
- Metrics
- Logging
- Multinode
- Minikube Setup
Components
Developer Guide
-
KV Block Manager
KV Block Manager#
The Dynamo KV Block Manager (KVBM) is a scalable runtime component designed to handle memory allocation, management, and remote sharing of Key-Value (KV) blocks for inference tasks across heterogeneous and distributed environments. It acts as a unified memory layer for frameworks like vLLM, SGLang, and TRT-LLM.
It offers:
-
A unified memory API that spans GPU memory, pinned host memory, remote RDMA-accessible memory, local or distributed pool of SSDs and remote file/object/cloud storage systems.
-
Support for evolving block lifecycles (allocate → register → match) with event-based state transitions that storage can subscribe to.
-
Integration with NIXL, a dynamic memory exchange layer used for remote registration, sharing, and access of memory blocks over RDMA/NVLink.
The Dynamo KV Block Manager serves as a reference implementation that emphasizes modularity and extensibility. Its pluggable design enables developers to customize components and optimize for specific performance, memory, and deployment needs.
| Feature | ||
|---|---|---|
| Backend | ✅ | Local |
| ✅ | Kubernetes | |
| LLM Framework | ✅ | vLLM |
| ❌ | TensorRT-LLM | |
| ❌ | SGLang | |
| Serving Type | ✅ | Aggregated |
| ✅ | Disaggregated |
previous SLA-based Plannernext Motivation behind KVBM
Privacy Policy | Manage My Privacy | Do Not Sell or Share My Data | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact
Copyright © 2024-2025, NVIDIA CORPORATION & AFFILIATES.
Links/Buttons:
- Skip to main content
- document.write(
<img src="../_static/nvidia-logo-horiz-rgb-wht-for-screen.svg" class="logo__image only-dark" alt="NVIDIA Dynamo Documentation - Home"/>); NVIDIA Dynamo Documentation - GitHub
- Installation
- Support Matrix
- Architecture
- Disaggregated Serving
- Examples
- Quickstart (K8s)
- Detailed Installation Guide
- Dynamo Operator
- Metrics
- Logging
- Multinode
- Minikube Setup
- Backends
- vLLM
- SGLang
- TensorRT-LLM
- Router
- Planner
- Pre-Deployment Profiling
- Load-based Planner
- SLA-based Planner
- KVBM
- Motivation
- KVBM Architecture
- Understanding KVBM components
- KVBM Further Reading
- LMCache Integration
- Dynamo Benchmarking Guide
- Planner Benchmark Example
- Health Checks
- Tuning Disaggregated Serving Performance
- Writing Python Workers in Dynamo
- Glossary
- #
- Privacy Policy
- Manage My Privacy
- Do Not Sell or Share My Data
- Terms of Service
- Accessibility
- Corporate Policies
- Product Security
- Contact