This diagram shows the NVIDIA Dynamo disaggregated inference system as implemented in components/backends/vllm. Color-coded flows indicate different types of operations:

🔵 Main Request Flow (Blue)#

The primary user journey through the system:

Discovery (S1): Client discovers the service endpoint
Request (S2): HTTP client sends API request to Frontend (OpenAI-compatible server on port 8000)
Validate (S3): Frontend forwards request to Processor for validation and routing
Route (S3): Processor routes the validated request to appropriate Decode Worker

🟠 Decision and Allocation Flow (Orange)#

The system’s intelligent routing and resource allocation:

Query (S4): Decode Worker queries for prefix cache hits to optimize processing
Disagg Decision (S5): Based on prefill length and queue size, the system decides whether it needs remote prefill 5a. Allocate (S5a): Decode Worker pre-allocates KV cache blocks in its local GPU memory
Queue (S6): If remote prefill is required, the system puts the RemotePrefillRequest with block IDs into the PrefillQueue

🟢 Prefill Worker Flow (Green)#

The dedicated prefill processing pipeline:

NATS Pull (S7): PrefillQueue uses a NATS consumer group to distribute work to available PrefillWorkers
Load Metadata (S8): PrefillWorker loads NIXL metadata from ETCD to establish GPU communication
Prefill (S9): Worker executes the prefill computation on the input tokens
NIXL Transfer (S10): Direct GPU-to-GPU transfer writes the prefilled KV cache to the Decode Worker’s pre-allocated blocks

🟣 Completion Flow (Purple)#

The response generation and delivery:

Notify (S11): PrefillWorker sends completion notification to Decode Worker
Decode (S12): Decode Worker decodes from its local KV cache containing prefilled data
Response (S13): The system sends the generated response to the Processor for post-processing, then through the Frontend to the Client

🔗 Infrastructure Connections (Dotted lines)#

Coordination and messaging support:

ETCD Connections (Gray, dotted)#

Frontend, Processor, Planner: Service discovery and registration
Decode Worker, PrefillWorker: NIXL metadata storage for GPU communication setup

NATS Connections (Teal, dotted)#

PrefillQueue: JetStream consumer group for reliable work distribution
Processor: Load balancing across workers

Planning Connections (Gold, dotted)#

Frontend → Planner: Metrics collection for auto-scaling decisions
Planner → Workers: Resource scaling commands for both Decode Worker and PrefillWorker

Technical Implementation Details#

NIXL (NVIDIA Interchange Library):#

Enables high-speed GPU-to-GPU data transfers using NVLink/PCIe
Decode Worker publishes GPU metadata to ETCD for coordination
PrefillWorker loads metadata to establish direct communication channels
Block-based transfers (64–128 tokens per block) for efficient batching

Disaggregated KV Cache:#

Each Decode Worker maintains local KV cache in its GPU memory
No shared storage bottlenecks—all transfers are direct worker-to-worker
Pre-allocated blocks ensure deterministic memory layout and performance

previous High Level Architecture next Dynamo Disaggregation: Separating Prefill and Decode for Enhanced Performance

On this page

NVIDIA uses cookies to improve your experience on our web site. We and our third-party partners also use cookies and other tools to collect and record information you provide as well as information about your interactions with our websites for performance improvement, analytics, and to assist in marketing efforts. By continuing to use this site or by clicking one of the buttons below, you agree to the use of cookies and other tools as described in our Privacy Policy and Cookie Policy (subject to your settings) and accept our Terms of Service (which contains important waivers). Please see our Privacy Policy for more information on our privacy practices.

We have detected the Global Privacy Control (GPC) signal and have opted you out of all optional cookies on this site for this browser. You can manage your cookie settings by clicking on "Manage Settings". Please see our Cookie Policy for more information. To opt out of non-cookie personal information "sales" / "sharing" for targeted advertising purposes, please visit the NVIDIA Preference Center. Please see our Privacy Policy for more information on our privacy practices.

We have detected the Global Privacy Control Signal (GPC) and have opted you out of all optional cookies on this browser. You can manage your cookie settings by clicking on "Manage Settings". Please see our Cookie Policy for more information. We have also opted you out of "sharing"/"sales" of personal information outside of cookies. You can manage these settings in the NVIDIA NVIDIA Preference Center. Please see our Privacy Policy for more information.

We have detected the Global Privacy Control Signal (GPC) and have opted you out of all optional cookies on this browser. You can manage your cookie settings by clicking on "Manage Settings". Please see our Cookie Policy for more information. We have also opted you out of "sharing"/"sales" of personal information outside of cookies which overrides at least one of your previous settings. You can manage them in the NVIDIA Preference Center. Please see our Privacy Policy for more information.

Manage Settings

Turn Off Optional Cookies Agree

Image 7: NVIDIA Logo

Cookie Settings

We and our third-party partners (including social media, advertising, and analytics partners) use cookies and other tracking technologies to collect, store, monitor, and process certain information about you when you visit our website. The information collected might relate to you, your preferences, or your device. We use that information to make the site work, analyze performance and traffic on our website, provide a more personalized web experience, and assist in our marketing efforts.

Under certain privacy laws, you have the right to direct us not to "sell" or "share" your personal information for targeted advertising. To opt-out of the "sale" and "sharing" of personal information through cookies, you must opt-out of optional cookies using the toggles below. To opt out of the "sale" and "sharing" of data collected by other means (e.g., online forms) you must also update your data sharing preferences through the NVIDIA Preference Center.

Click on the different category headings below to find out more and change the settings according to your preference. You cannot opt out of Required Cookies as they are deployed to ensure the proper functioning of our website (such as prompting the cookie banner and remembering your settings, etc.). By clicking "Save and Accept" or "Decline All" at the bottom, you consent to the use of cookies and other tools as described in our Cookie Policy in accordance with your settings and accept our Terms of Service (which contains important waivers). For more information about our privacy practices, please see our Privacy Policy.

Required Cookies

Always Active

These cookies enable core functionality such as security, network management, and accessibility. These cookies are required for the site to function and cannot be turned off.

Cookies Details‎

Performance Cookies

Performance Cookies

These cookies are used to provide quantitative measures of our website visitors, such as the number of times you visit, time on page, your mouse movements, scrolling, clicks and keystroke activity on the websites; other browsing, search, or product research behavior; and what brought you to our site. These cookies may store a unique ID so that our system will remember you when you return. Information collected with these cookies is used to measure and find ways to improve website performance.

Cookies Details‎

Personalization Cookies

Personalization Cookies

These cookies collect data about how you have interacted with our website to help us improve your web experience, such as which pages you have visited. These cookies may store a unique ID so that our system will remember you when you return. They may be set by us or by third party providers whose services we have added to our pages. These cookies enable us to provide enhanced website functionality and personalization as well as make the marketing messages we send to you more relevant to your interests. If you do not allow these cookies, then some or all of these services may not function properly.

Cookies Details‎

Advertising Cookies

Advertising Cookies

These cookies record your visit to our websites, the pages you have visited and the links you have followed to influence the advertisements that you see on other websites. These cookies and the information they collect may be managed by other companies, including our advertising partners, and may be used to build a profile of your interests and show you relevant advertising on other sites. We and our advertising partners will use this information to make our websites and the advertising displayed on it, more relevant to your interests.

Cookies Details‎

Cookie List

Clear