Deploying Dynamo Inference Graphs to Kubernetes using Helm — Dynamo
Title: Deploying Dynamo Inference Graphs to Kubernetes using Helm#
Published Time: Fri, 18 Jul 2025 15:25:22 GMT
Markdown Content: This guide describes the deployment process of an inference graph created using the Dynamo SDK onto a Kubernetes cluster.
While this guide covers deployment of Dynamo inference graphs using Helm, the preferred method to deploy an inference graph is to deploy with the Dynamo cloud platform. The Dynamo cloud platform simplifies the deployment and management of Dynamo inference graphs. It includes a set of components (Operator, Kubernetes Custom Resources, etc.) that work together to streamline the deployment and management process.
Once an inference graph is defined using the Dynamo SDK, it can be deployed onto a Kubernetes cluster using a simple dynamo deploy command that orchestrates the following deployment steps:
-
Building docker images from inference graph components on the cluster
-
Intelligently composing the encoded inference graph into a complete deployment on Kubernetes
-
Enabling autoscaling, monitoring, and observability for the inference graph
-
Easy administration of deployments via UI
Helm Deployment Guide#
Setting up MicroK8s#
Follow these steps to set up a local Kubernetes cluster using MicroK8s:
- Install MicroK8s:
sudo snap install microk8s --classic
- Configure user permissions:
sudo usermod -a -G microk8s $USER sudo chown -R $USER ~/.kube
-
Important: Log out and log back in for the permissions to take effect
-
Start MicroK8s:
microk8s start
- Enable required addons:
Enable GPU support
microk8s enable gpu
Enable storage support
See: https://microk8s.io/docs/addon-hostpath-storage
microk8s enable storage
- Configure kubectl:
mkdir -p ~/.kube microk8s config >> ~/.kube/config
After completing these steps, you should be able to use the kubectl command to interact with your cluster.
Installing Required Dependencies#
Follow these steps to set up the namespace and install required components:
- Set environment variables:
export NAMESPACE=dynamo-playground export RELEASE_NAME=dynamo-platform export PROJECT_ROOT=$(pwd)
- Install NATS messaging system:
Navigate to dependencies directory
cd $PROJECT_ROOT/deploy/helm/dependencies
Add and update NATS Helm repository
helm repo add nats https://nats-io.github.io/k8s/helm/charts/ helm repo update
Install NATS with custom values
helm install --namespace ${NAMESPACE} ${RELEASE_NAME}-nats nats/nats
--values nats-values.yaml
- Install etcd key-value store:
Install etcd using Bitnami chart
helm install --namespace ${NAMESPACE} ${RELEASE_NAME}-etcd
oci://registry-1.docker.io/bitnamicharts/etcd
--values etcd-values.yaml
After completing these steps, your cluster has the necessary messaging and storage infrastructure for running Dynamo inference graphs.
Building and Deploying the Pipeline#
Follow these steps to containerize and deploy your inference pipeline:
- Build and containerize the pipeline:
Navigate to example directory
cd $PROJECT_ROOT/examples/hello_world
Set runtime image name
export DYNAMO_IMAGE=<dynamo_base_image>
Build and containerize the Frontend service
dynamo build --containerize hello_world:Frontend
- Push container to registry:
Tag the built image for your registry
docker tag <BUILT_IMAGE_TAG> <TAG>
Push to your container registry
docker push <TAG>
- Deploy using Helm:
Navigate to the deployment directory
cd $PROJECT_ROOT/deploy/helm
Set release name for Helm
export HELM_RELEASE=hello-world-manual
Generate Helm values file from Frontend service
dynamo get frontend > pipeline-values.yaml
Install/upgrade Helm release
helm upgrade -i "$HELM_RELEASE" ./chart
-f pipeline-values.yaml
--set image=<TAG>
--set dynamoIdentifier="hello_world:Frontend"
-n "$NAMESPACE"
- Test the deployment:
Forward the service port to localhost
kubectl -n ${NAMESPACE} port-forward svc/${HELM_RELEASE}-frontend 3000:80
Test the API endpoint
curl -X 'POST' 'http://localhost:3000/generate'
-H 'accept: text/event-stream'
-H 'Content-Type: application/json'
-d '{"text": "test"}'
Using the Deployment Script#
For convenience, you can use the deployment script at deploy/helm/deploy.sh that automates all of these steps:
export DYNAMO_IMAGE=<dynamo_docker_image_name> ./deploy.sh <docker_registry> <k8s_namespace> <path_to_dynamo_directory> <dynamo_identifier> [<dynamo_config_file>]
Example: export DYNAMO_IMAGE=nvcr.io/nvidian/nim-llm-dev/dynamo-base-worker:0.0.1
Example: ./deploy.sh nvcr.io/nvidian/nim-llm-dev my-namespace ../../../examples/hello_world/ hello_world:Frontend
Example: ./deploy.sh nvcr.io/nvidian/nim-llm-dev my-namespace ../../../examples/llm graphs.disagg_router:Frontend ../../../examples/llm/configs/disagg_router.yaml
This script handles:
-
Building and pushing the Docker image
-
Setting up the Helm values
-
Installing/upgrading the Helm release
-
Configuring the necessary Kubernetes resources
Links/Buttons:
- Skip to main content
- document.write(
<img src="../../_static/nvidia-logo-horiz-rgb-wht-for-screen.svg" class="logo__image only-dark" alt="Dynamo - Home"/>); Dynamo - GitHub
- Support Matrix
- Getting Started
- High Level Architecture
- Distributed Runtime
- Disaggregated Serving
- KV Block Manager
- Motivation
- KVBM Architecture
- Understanding KVBM components
- KVBM Further Reading
- KV Cache Routing
- Planner
- Load-based Planner
- SLA-based Planner
- Dynamo Architecture Flow
- CLI Overview
- Running Dynamo (dynamo run)
- Serving Inference Graphs (dynamo serve)
- Building Dynamo (dynamo build)
- Deploying Inference Graphs (dynamo deploy)
- Writing Python Workers in Dynamo
- Disaggregation and Performance Tuning
- KV Cache Router Performance Tuning
- Working with Dynamo Kubernetes Operator
- Dynamo Cloud Kubernetes Platform
- Deploying Dynamo Inference Graphs to Kubernetes using the Dynamo Cloud Platform
- Manual Helm Deployment
- GKE Setup Guide
- Minikube Setup Guide
- Model Caching with Fluid
- Planner Benchmark Example
- SDK Reference
- Python API
- Hello World Example: Basic
- Hello World Example: Aggregated and Disaggregated Deployment
- LLM Deployment Examples
- Multinode Examples
- LLM Deployment Examples using TensorRT-LLM
- Glossary
- #
- Building the Dynamo Base Image
- Privacy Policy
- Manage My Privacy
- Do Not Sell or Share My Data
- Terms of Service
- Accessibility
- Corporate Policies
- Product Security
- Contact