Before You Start
MindCluster cluster scheduling components allow SGLang inference job deployments through Open Model Engine (OME) for job scheduling and rescheduling of faulty instances.
This section provides feature principles and configuration examples for deploying OME-based SGLang inference jobs.
Prerequisite
Before deploying SGLang inference services, ensure that the following components have been installed. If they are not installed, install them by referring to Installation and Deployment.
- Volcano
- Ascend Device Plugin
- Ascend Docker Runtime
- ClusterD
- (Optional) NodeD
Supported Products
- Atlas 800I A2 inference server
- Atlas 800I A3 SuperPoD Server
Instructions
MindCluster cluster scheduling components support containerized deployment of SGLang inference services and fault rescheduling in the following ways: This section focuses on the CLI and one-click script deployment methods.
- Using the CLI: Deploy a job through its YAML file.
- One-click script deployment: Deploy a job using an automation script.
- Use after integration: Integrate the cluster scheduling components into an existing third-party AI platform or an AI platform developed based on the cluster scheduling components.
Parent topic: Best Practices of SGLang Inference Jobs