Full NPU Scheduling

Function Highlights

When running a training or inference job, you can schedule it to the entire NPU of a node and exclusively occupy the NPU to execute the job. This feature uses the basic scheduling function supported by Kubernetes and works with Volcano or other schedulers to select proper NPUs based on the physical topology of NPUs. This maximizes NPU performance, schedules NPUs for training or inference jobs, and optimally allocates other resources.

Volcano can be used to implement switch affinity scheduling and Ascend AI processor-based affinity scheduling. Based on the interconnection topology and processing logic of the Ascend AI processor, Volcano, as a scheduler, can maximize the computing performance of the Ascend AI processor. For details about switch affinity scheduling and Ascend AI processor-based affinity scheduling, see Affinity Scheduling.

Required Component

  • Volcano or other schedulers
  • Ascend Device Plugin
  • Ascend Docker Runtime
  • Ascend Operator
  • ClusterD
  • NodeD

Instructions

  1. Refer to Installation and Deployment for component installation.
  2. Refer to Full NPU Scheduling/Static vNPU Scheduling (Training) for feature usage.