Working Principle

The schematic diagram of the feature varies slightly depending on the type of inference jobs. To use static vNPU scheduling, you need to use the npu-smi tool to create the required vNPUs in advance.

acjob

Figure 1 shows the principle of acjob.

Figure 1 acjob scheduling principle

The description of each step is as follows:

Cluster scheduling components periodically report node and processor information. kubelet reports the number of processors on the node object.
- Ascend Device Plugin periodically reports the processor topology information.
  Report the entire NPU information. The physical ID of the processor is reported to device-info-cm. The total number of allocatable processors, number of allocated processors, and basic processor information (device ip and super_device_ip) are reported to the node for full NPU scheduling.
- When a node is faulty, NodeD periodically reports the node health status, node hardware fault information, and node DPC shared storage fault information to node-info-cm.
After reading the information in device-info-cm and node-info-cm, ClusterD writes the information to cluster-info-cm.
A user delivers an acjob through kubectl or other deep learning platforms.
Ascend Operator creates a PodGroup for the job. For details about PodGroup, see the Volcano open source official document.
Ascend Operator creates a pod for the job and injects environment variables required for collective communication into the container.
volcano-scheduler selects a proper node for the job based on the node and processor topology information and writes the selected processor information to the annotation of the pod. Write the entire NPU information for full NPU scheduling.
When kubelet is used to create a container, Ascend Device Plugin is called to mount the processor. Ascend Device Plugin or volcano-scheduler writes the processor information to the annotation of the pod. Ascend Docker Runtime assists in mounting the corresponding resource.
Ascend Operator reads the annotation information of the pod and writes the information to hccl.json.
The container reads environment variables or hccl.json information, establishes a communication channel, and starts to execute the inference job.

Currently, Ascend Operator can generate hccl.json only for PyTorch jobs.

vcjob

Figure 2 shows the principle of vcjob.

Figure 2 vcjob scheduling principle

The description of each step is as follows:

Cluster scheduling components periodically report node and processor information. kubelet reports the number of processors on the node object.
- Ascend Device Plugin periodically reports the processor topology information.
  - Report the entire NPU information. The physical ID of the processor is reported to device-info-cm. The total number of allocatable processors and the number of allocated processors are reported to the node for full NPU scheduling.
  - Report vNPU information to the node for static vNPU scheduling.
- When a node is faulty, NodeD periodically reports the node health status, node hardware fault information, and node DPC shared storage fault information to node-info-cm.
After reading the information in device-info-cm and node-info-cm, ClusterD writes the information to cluster-info-cm.
Deliver a vcjob job through kubectl or other deep learning platforms.
volcano-controller creates a PodGroup for the job. For details about PodGroup, see the Volcano open source official document.
volcano-controller creates a pod for the job when cluster resources meet the job requirements.
volcano-scheduler selects a proper node for the job based on the node and processor topology information and writes the selected processor information to the annotation of the pod.
When kubelet is used to create a container, Ascend Device Plugin is called to mount the processor. Ascend Device Plugin writes the processor information to the annotation of the pod. Ascend Docker Runtime assists in mounting the corresponding resource.

deploy Job

Figure 3 shows the principle of deploy jobs.

Figure 3 deploy job scheduling principle

The description of each step is as follows:

Cluster scheduling components periodically report node and processor information. kubelet reports the number of processors on the node object.
- Ascend Device Plugin periodically reports the processor topology information.
  - Report the entire NPU information. The physical ID of the processor is reported to device-info-cm. The total number of allocatable processors and the number of allocated processors are reported to the node for full NPU scheduling.
  - Report vNPU information to the node for static vNPU scheduling.
- When a node is faulty, NodeD periodically reports the node health status, node hardware fault information, and node DPC shared storage fault information to node-info-cm.
After reading the information in device-info-cm and node-info-cm, ClusterD writes the information to cluster-info-cm.
Deliver a deploy job through kubectl or other deep learning platforms.
kube-controller creates a pod for the job.
volcano-controller creates a PodGroup for the job. For details about PodGroup, see the Volcano open source official document.
volcano-scheduler selects a proper node for the job based on the node and processor topology information and writes the selected processor information to the annotation of the pod.
When kubelet is used to create a container, Ascend Device Plugin is called to mount the processor. Ascend Device Plugin writes the processor information to the annotation of the pod. Ascend Docker Runtime assists in mounting the corresponding resource.

Parent topic: Full NPU Scheduling or Static vNPU Scheduling (Inference)