Dynamo NIXL Connect — NVIDIA Dynamo Documentation
Title: Dynamo NIXL Connect — NVIDIA Dynamo Documentation
URL Source: https://docs.nvidia.com/dynamo/archive/0.5.1/API/nixl_connect/README.html?userAgent=PromptingBot%2F1.0.0
Published Time: Tue, 14 Oct 2025 16:26:12 GMT
Markdown Content: Dynamo NIXL Connect#
Dynamo NIXL Connect specializes in moving data between models/workers in a Dynamo Graph, and for the use cases where registration and memory regions need to be dynamic. Dynamo connect provides utilities for such use cases, using the NIXL-based I/O subsystem via a set of Python classes. The relaxed registration comes with some performance overheads, but simplifies the integration process. Especially for larger data transfer operations, such as between models in a multi-model graph, the overhead would be marginal. The dynamo.nixl_connect library can be imported by any Dynamo container hosted application.
Note
Dynamo NIXL Connect will pick the best available method of data transfer available to it. The available methods depend on the hardware and software configuration of the machines and network running the graph. GPU Direct RDMA operations require that both ends of the operation have:
-
NIC and GPU capable of performing RDMA operations
-
Device drivers that support GPU-NIC direct interactions (aka “zero copy”) and RDMA operations
-
Network that supports InfiniBand or RoCE
With any of the above not satisfied, GPU Direct RDMA will not be available to the graph’s workers, and less-optimal methods will be utilized to ensure basic functionality. For additional information, please read this GPUDirect RDMA document.
import dynamo.nixl_connect
All operations using the NIXL Connect library begin with the Connector class and the type of operation required. There are four types of supported operations:
- Register local readable memory:
Register local memory buffer(s) with the NIXL subsystem to enable a remote worker to read from.
- Register local writable memory:
Register local memory buffer(s) with the NIXL subsystem to enable a remote worker to write to.
- Read from registered, remote memory:
Read remote memory buffer(s), registered by a remote worker to be readable, into local memory buffer(s).
- Write to registered, remote memory:
Write local memory buffer(s) to remote memory buffer(s) registered by a remote worker to writable.
When available, by connecting correctly paired operations, high-throughput GPU Direct RDMA data transfers can be completed. Given the list above, the correct pairing of operations would be 1 & 3 or 2 & 4. Where one side is a “(read|write)-able operation” and the other is its correctly paired “(read|write) operation”. Specifically, a read operation must be paired with a readable operation, and a write operation must be paired with a writable operation.
Examples#
Generic Example#
In the diagram below, Local creates a WritableOperation intended to receive data from Remote. Local then sends metadata about the requested operation to Remote. Remote then uses the metadata to create a WriteOperation which will perform the GPU Direct RDMA memory transfer, when available, from Remote’s GPU memory to Local’s GPU memory.
Note
When RDMA isn’t available, the NIXL data transfer will still complete using non-accelerated methods.
Multimodal Example#
In the case of the Dynamo Multimodal Disaggregated Example:
-
The HTTP frontend accepts a text prompt and a URL to an image.
-
The prompt and URL are then enqueued with the Processor before being dispatched to the first available Decode Worker.
-
Decode Worker then requests a Prefill Worker to provide key-value data for the LLM powering the Decode Worker.
-
Prefill Worker then requests that the image be processed and provided as embeddings by the Encode Worker.
-
Encode Worker acquires the image, processes it, performs inference on the image using a specialized vision model, and finally provides the embeddings to Prefill Worker.
-
Prefill Worker receives the embeddings from Encode Worker and generates a key-value cache (KV$) update for Decode Worker’s LLM and writes the update directly to the GPU memory reserved for the data.
-
Finally, Decode Worker performs the requested inference.
Note
In this example, it is the data transfer between the Prefill Worker and the Encode Worker that utilizes the Dynamo NIXL Connect library. The KV Cache transfer between Decode Worker and Prefill Worker utilizes a different connector that also uses the NIXL-based I/O subsystem underneath.
Code Examples#
See prefill_worker or decode_worker from our Multimodal example, for how they coordinate directly with the Encode Worker by creating a WritableOperation, sending the operation’s metadata via Dynamo’s round-robin dispatcher, and awaiting the operation for completion before making use of the transferred data.
See encode_worker from our Multimodal example, for how the resulting embeddings are registered with the NIXL subsystem by creating a Descriptor, a WriteOperation is created using the metadata provided by the requesting worker, and the worker awaits for the data transfer to complete for yielding a response.
Python Classes#
References#
Links/Buttons:
- Skip to main content
- document.write(
<img src="../../_static/nvidia-logo-horiz-rgb-wht-for-screen.svg" class="logo__image only-dark" alt="NVIDIA Dynamo Documentation - Home"/>); NVIDIA Dynamo Documentation - GitHub
- Installation
- Support Matrix
- Architecture
- Disaggregated Serving
- Examples
- Quickstart (K8s)
- Detailed Installation Guide
- Dynamo Operator
- Metrics
- Logging
- Multinode
- Minikube Setup
- Backends
- vLLM
- SGLang
- TensorRT-LLM
- Router
- Planner
- Pre-Deployment Profiling
- Load-based Planner
- SLA-based Planner
- KVBM
- Motivation
- KVBM Architecture
- Understanding KVBM components
- KVBM Further Reading
- LMCache Integration
- Dynamo Benchmarking Guide
- Planner Benchmark Example
- Health Checks
- Tuning Disaggregated Serving Performance
- Writing Python Workers in Dynamo
- Glossary
- #
- GPUDirect RDMA
- Connector
- WritableOperation
- WriteOperation
- prefill_worker
- encode_worker
- Descriptor
- Device
- ReadOperation
- ReadableOperation
- NVIDIA Dynamo
- NVIDIA Inference Transfer Library (NIXL)
- Dynamo Multimodal Example
- NVIDIA GPU Direct
- Privacy Policy
- Manage My Privacy
- Do Not Sell or Share My Data
- Terms of Service
- Accessibility
- Corporate Policies
- Product Security
- Contact