Before You Start

Prerequisites

Before using full NPU scheduling or static vNPU scheduling on the CLI, ensure that the following components have been installed. If they are not installed, refer to Installation and Deployment for further operations.

Volcano or other schedulers
Ascend Device Plugin
Ascend Docker Runtime
Ascend Operator
ClusterD
NodeD

Instructions

Full NPU scheduling or static vNPU scheduling:

Use on the CLI: Install cluster scheduling components and enable full NPU scheduling through the CLI.
Use after integration: Integrate the cluster scheduling components into an existing third-party AI platform or an AI platform developed based on the cluster scheduling components.

Usage

Resource monitoring can be used together with all features in inference scenarios.
Multiple inference jobs are running in a cluster at the same time. Each job can use different features, but jobs that use static vNPUs and dynamic vNPUs cannot coexist.
The recovery of inference card faults feature needs to be used together with the full NPU scheduling feature. To enable recovery upon faults, you only need to set the startup parameter -hotReset of Ascend Device Plugin to 0 or 2. (The default value is -1, indicating that fault recovery is not supported.)
Full NPU scheduling can deliver single-server jobs with a single replica or multiple replicas. Each replica works independently. Only distributed jobs of the acjob type can be deployed on the inference server (equipped with Atlas 300I Duo inference cards), A200I A2 Box heterogeneous component, and Atlas 800I A2 inference server.
Static vNPU scheduling supports only single-server jobs with a single replica and does not support distributed jobs.
The static vNPU scheduling feature can be used in conjunction with the computing power virtualization feature. For detailed descriptions and operations of static virtualization, see Static Virtualization.

Supported Products

Full NPU scheduling is supported by the following products:
- Inference server (equipped with Atlas 300I inference cards)
- Atlas inference product
- Atlas 800I A2 inference server
- A200I A2 Box heterogeneous component
- Atlas 800I A3 SuperPoD Server
Static vNPU scheduling is supported by the following products:
Atlas inference product

Usage Process

For details about how to use full NPU scheduling or static vNPU scheduling on the CLI, see Figure 1.

The process of using Volcano through CLI is the same as that of using other schedulers. The main difference is that you need to create a job YAML file by referring to Use on the CLI (Other Schedulers) if you want to use another scheduler. The other operations of using another scheduler are the same as those of using Volcano. For details, see Use on the CLI (Volcano).

Figure 1 Usage process

Parent topic: Full NPU Scheduling or Static vNPU Scheduling (Inference)