Introduction
As a distributed cluster and data management component for large language models (LLMs), LLM-DataDist provides high-performance, zero-copy point-to-point key-value (KV) transmission capabilities, which are exposed to users via simple APIs.
This document serves as a development guide to instruct developers on how to use LLM-DataDist APIs to implement inter-cluster data transmission and build a disaggregated framework for LLM inference.
Manual |
Introduction |
|---|---|
LLM-DataDist Development Guide (C++) |
Describes how to perform link management and KV cache management via the C++-based LLM-DataDist APIs. This scenario supports unilateral link establishment (that is, the client initiates link establishment to the server). Data transmission is restricted to pulling KV cache from Decode to Prompt and pushing KV cache from Prompt to Decode. Only D2D transmission is supported in this scenario. |
LLM-DataDist Development Guide (Python) |
Describes how to perform link management and KV cache management via the Python-based LLM-DataDist APIs in KvCacheManager mode. This scenario supports unilateral link establishment. Data transmission is restricted to pulling KV cache from Decode to Prompt and pushing KV cache from Prompt to Decode. Only D2D transmission is supported in this scenario. |