Before You Start

MindCluster cluster scheduling components allow users to deploy vLLM inference jobs through AIBrix StormService for job scheduling and rescheduling of faulty instances. The used AIBrix version is v0.5.0, requiring vLLM-Ascend with main branch commit ID 41fbc5e or later.

This section provides feature principles and configuration examples. You can refer to the configuration examples to deploy AIBrix-based vLLM inference jobs.

Prerequisites

Before deploying vLLM inference jobs, ensure that the following components have been installed. If they are not installed, install them by referring to Installation and Deployment.
  • Volcano
  • Ascend Device Plugin
  • Ascend Docker Runtime
  • ClusterD
  • (Optional) NodeD

Supported Products

  • Atlas 800I A2 inference server
  • Atlas 800I A3 SuperPoD Server

Instructions

MindCluster cluster scheduling components support containerized deployment of vLLM inference services and rescheduling upon faults in the following ways: This section focuses on the CLI and one-click script deployment methods.

  • Using the CLI: Deploy a job through its YAML file.
  • One-click script deployment: Deploy a job using an automatic script.
  • Use after integration: Integrate the cluster scheduling components into an existing third-party AI platform or an AI platform developed based on the cluster scheduling components.