KV Block Manager — NVIDIA Dynamo Documentation
Title: KV Block Manager — NVIDIA Dynamo Documentation
URL Source: https://docs.nvidia.com/dynamo/archive/0.6.1/kvbm/kvbm_intro.html
Published Time: Sat, 08 Nov 2025 00:28:54 GMT
Markdown Content: Skip to main content
Back to top Ctrl+K
latest
latest0.6.10.6.00.5.10.5.00.4.10.4.00.3.20.3.10.3.00.2.10.2.0
Search Ctrl+K
Search Ctrl+K
latest
latest0.6.10.6.00.5.10.5.00.4.10.4.00.3.20.3.10.3.00.2.10.2.0
Table of Contents
Getting Started
Kubernetes Deployment
User Guides
Components
Design Docs
-
KV Block Manager
KV Block Manager#
The Dynamo KV Block Manager (KVBM) is a scalable runtime component designed to handle memory allocation, management, and remote sharing of Key-Value (KV) blocks for inference tasks across heterogeneous and distributed environments. It acts as a unified memory layer for frameworks like vLLM, SGLang, and TRT-LLM.
It offers:
-
A unified memory API that spans GPU memory(in future) , pinned host memory, remote RDMA-accessible memory, local or distributed pool of SSDs and remote file/object/cloud storage systems.
-
Support for evolving block lifecycles (allocate → register → match) with event-based state transitions that storage can subscribe to.
-
Integration with NIXL, a dynamic memory exchange layer used for remote registration, sharing, and access of memory blocks over RDMA/NVLink.
The Dynamo KV Block Manager serves as a reference implementation that emphasizes modularity and extensibility. Its pluggable design enables developers to customize components and optimize for specific performance, memory, and deployment needs.
| Feature | ||
|---|---|---|
| Backend | ✅ | Local |
| ✅ | Kubernetes | |
| LLM Framework | ✅ | vLLM |
| ✅ | TensorRT-LLM | |
| ❌ | SGLang | |
| Serving Type | ✅ | Aggregated |
| ✅ | Disaggregated |
previous SLA-based Plannernext Motivation behind KVBM
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact
Copyright © 2024-2025, NVIDIA CORPORATION & AFFILIATES.
Links/Buttons:
- Skip to main content
- NVIDIA Dynamo Documentation
- latest
- 0.6.1
- 0.6.0
- 0.5.1
- 0.5.0
- 0.4.1
- 0.4.0
- 0.3.2
- 0.3.1
- 0.3.0
- 0.2.1
- 0.2.0
- GitHub
- Installation
- Support Matrix
- Examples
- Deployment Guide
- Kubernetes Quickstart
- Detailed Installation Guide
- Dynamo Operator
- Minikube Setup
- Observability (K8s)
- Metrics
- Logging
- Multinode
- Multinode Deployments
- Grove
- Tool Calling
- Multimodality Support
- Finding Best Initial Configs
- Dynamo Benchmarking Guide
- Tuning Disaggregated Performance
- Writing Python Workers in Dynamo
- Observability (Local)
- Metrics Visualization with Prometheus and Grafana
- Health Checks
- Glossary
- Backends
- vLLM
- SGLang
- TensorRT-LLM
- Router
- Planner
- SLA Planner Quick Start
- SLA-Driven Profiling
- SLA-based Planner
- KVBM
- Motivation
- Architecture
- Components
- Design Deep Dive
- Integrations
- KVBM in vLLM
- KVBM in TRTLLM
- LMCache Integration
- Further Reading
- Overall Architecture
- Architecture Flow
- Disaggregated Serving
- Distributed Runtime
- #
- Privacy Policy
- Your Privacy Choices
- Terms of Service
- Accessibility
- Corporate Policies
- Product Security
- Contact