Process Description

An AIBrix-based vLLM inference job consists of Routing pods and inference instance pods. Inference instance pods are classified into prefill instance pods and decode instance pods pod. Routing pods do not require NPU resources. AIBrix generates different workloads based on inference service configuration modes to create different inference instances, and the Router provides inference services for external systems in a unified manner.

For details about AIBrix-based job deployment, see AIBrix documentation.

Procedure

Figure 1 shows the procedure for using MindCluster cluster scheduling components to deploy an AIBrix-based vLLM inference job via commands.

Figure 1 Procedure