Multinode Deployment Guide — NVIDIA Dynamo Documentation
Title: Multinode Deployment Guide — NVIDIA Dynamo Documentation
Published Time: Thu, 18 Sep 2025 23:05:39 GMT
Markdown Content: Multinode Deployment Guide#
This guide explains how to deploy Dynamo workloads across multiple nodes. Multinode deployments enable you to scale compute-intensive LLM workloads across multiple physical machines, maximizing GPU utilization and supporting larger models.
Overview#
Dynamo supports multinode deployments through the multinode section in resource specifications. This allows you to:
-
Distribute workloads across multiple physical nodes
-
Scale GPU resources beyond a single machine
-
Support large models requiring extensive tensor parallelism
-
Achieve high availability and fault tolerance
Basic requirements#
-
Kubernetes Cluster: Version 1.24 or later
-
GPU Nodes: Multiple nodes with NVIDIA GPUs
-
High-Speed Networking: InfiniBand, RoCE, or high-bandwidth Ethernet (recommended for optimal performance)
Advanced Multinode Orchestration#
Using Grove (default)#
For sophisticated multinode deployments, Dynamo integrates with advanced Kubernetes orchestration systems:
-
Grove: Network topology-aware gang scheduling and auto-scaling for AI workloads
-
KAI-Scheduler: Kubernetes native scheduler optimized for AI workloads at scale
These systems provide enhanced scheduling capabilities including topology-aware placement, gang scheduling, and coordinated auto-scaling across multiple nodes.
Features Enabled with Grove:
-
Declarative composition of AI workloads
-
Multi-level horizontal auto-scaling
-
Custom startup ordering for components
-
Resource-aware rolling updates
KAI-Scheduler is a Kubernetes native scheduler optimized for AI workloads at large scale.
Features Enabled with KAI-Scheduler:
-
Gang scheduling
-
Network topology-aware pod placement
-
AI workload-optimized scheduling algorithms
-
GPU resource awareness and allocation
-
Support for complex scheduling constraints
-
Integration with Grove for enhanced capabilities
-
Performance optimizations for large-scale deployments
Prerequisites#
-
Grove installed on the cluster
-
(Optional) KAI-Scheduler installed on the cluster with default queue name
dynamocreated. You can use a different queue name by setting thenvidia.com/kai-scheduler-queueannotation on the DGD resource.
KAI-Scheduler is optional but recommended for advanced scheduling capabilities.
Using LWS and Volcano#
LWS is a simple multinode deployment mechanism that allows you to deploy a workload across multiple nodes.
-
LWS: LWS Installation
-
Volcano: Volcano Installation
Volcano is a Kubernetes native scheduler optimized for AI workloads at scale. It is used in conjunction with LWS to provide gang scheduling support.
Core Concepts#
Orchestrator Selection Algorithm#
Dynamo automatically selects the best available orchestrator for multinode deployments using the following logic:
When Both Grove and LWS are Available:#
-
Grove is selected by default (recommended for advanced AI workloads)
-
LWS is selected if you explicitly set
nvidia.com/enable-grove: "false"annotation on your DGD resource
When Only One Orchestrator is Available:#
- The installed orchestrator (Grove or LWS) is automatically selected
Scheduler Integration:#
-
With Grove: Automatically integrates with KAI-Scheduler when available, providing:
-
Advanced queue management via
nvidia.com/kai-scheduler-queueannotation -
AI-optimized scheduling policies
-
Resource-aware workload placement
-
-
With LWS: Uses Volcano scheduler for gang scheduling and resource coordination
Configuration Examples:#
Default (Grove with KAI-Scheduler):
apiVersion: nvidia.com/v1alpha1 kind: DynamoGraphDeployment metadata: name: my-multinode-deployment annotations: nvidia.com/kai-scheduler-queue: "gpu-intensive" # Optional: defaults to "dynamo" spec:
... your deployment spec
Force LWS usage:
apiVersion: nvidia.com/v1alpha1 kind: DynamoGraphDeployment metadata: name: my-multinode-deployment annotations: nvidia.com/enable-grove: "false" spec:
... your deployment spec
The multinode Section#
The multinode section in a resource specification defines how many physical nodes the workload should span:
apiVersion: nvidia.com/v1alpha1 kind: DynamoGraphDeployment metadata: name: my-multinode-deployment spec:
... your deployment spec
services: my-service: ... multinode: nodeCount: 2 resources: limits: gpu: "2" # 2 GPUs per node
GPU Distribution#
The relationship between multinode.nodeCount and gpu is multiplicative:
-
multinode.nodeCount: Number of physical nodes -
gpu: Number of GPUs per node -
Total GPUs:
multinode.nodeCount × gpu
Example:
-
multinode.nodeCount: "2"+gpu: "4"= 8 total GPUs (4 GPUs per node across 2 nodes) -
multinode.nodeCount: "4"+gpu: "8"= 32 total GPUs (8 GPUs per node across 4 nodes)
Tensor Parallelism Alignment#
The tensor parallelism (tp-size or --tp) in your command/args must match the total number of GPUs:
Example: 2 multinode.nodeCount × 4 GPUs = 8 total GPUs
apiVersion: nvidia.com/v1alpha1 kind: DynamoGraphDeployment metadata: name: my-multinode-deployment spec:
... your deployment spec
services: my-service: ... multinode: nodeCount: 2 resources: limits: gpu: "4" extraPodSpec: mainContainer: ... args:
Command args must use tp-size=8
- "--tp-size"
- "8" # Must equal multinode.nodeCount × gpu
Next Steps#
For additional support and examples, see the working multinode configurations in:
-
TensorRT-LLM: components/backends/trtllm/deploy/
These examples demonstrate proper usage of the multinode section with corresponding gpu limits and correct tp-size configuration.
Links/Buttons:
- Skip to main content
- document.write(
<img src="../../_static/nvidia-logo-horiz-rgb-wht-for-screen.svg" class="logo__image only-dark" alt="NVIDIA Dynamo Documentation - Home"/>); NVIDIA Dynamo Documentation - GitHub
- Installation
- Support Matrix
- Architecture
- Disaggregated Serving
- Examples
- Quickstart (K8s)
- Dynamo Operator
- Metrics
- Logging
- Multinode
- Minikube Setup
- Backends
- vLLM
- SGLang
- TensorRT-LLM
- Router
- Planner
- Pre-Deployment Profiling
- Load-based Planner
- SLA-based Planner
- KVBM
- Motivation
- KVBM Architecture
- Understanding KVBM components
- KVBM Further Reading
- LMCache Integration
- Tuning Disaggregated Serving Performance
- Writing Python Workers in Dynamo
- Glossary
- #
- Grove
- KAI-Scheduler
- LWS Installation
- Volcano Installation
- components/backends/sglang/deploy/
- components/backends/trtllm/deploy/
- components/backends/vllm/deploy/
- Privacy Policy
- Manage My Privacy
- Do Not Sell or Share My Data
- Terms of Service
- Accessibility
- Corporate Policies
- Product Security
- Contact