Before You Start

MindCluster cluster scheduling components allow SGLang inference job deployments through Open Model Engine (OME) for job scheduling and rescheduling of faulty instances.

This section provides feature principles and configuration examples for deploying OME-based SGLang inference jobs.

Prerequisite

Before deploying SGLang inference services, ensure that the following components have been installed. If they are not installed, install them by referring to Installation and Deployment.

Volcano
Ascend Device Plugin
Ascend Docker Runtime
ClusterD
(Optional) NodeD

Supported Products

Atlas 800I A2 inference server
Atlas 800I A3 SuperPoD Server

Instructions

MindCluster cluster scheduling components support containerized deployment of SGLang inference services and fault rescheduling in the following ways: This section focuses on the CLI and one-click script deployment methods.

Using the CLI: Deploy a job through its YAML file.
One-click script deployment: Deploy a job using an automation script.
Use after integration: Integrate the cluster scheduling components into an existing third-party AI platform or an AI platform developed based on the cluster scheduling components.

Parent topic: Best Practices of SGLang Inference Jobs