Working Principle

The principle diagram of the feature varies depending on the training job type. To use static vNPU scheduling, you need to use the npu-smi tool to create the required vNPUs in advance.

acjob

Figure 1 shows the principle of acjob.

Figure 1 acjob scheduling principle

The description of each step is as follows:

Cluster scheduling components periodically report node and processor information. kubelet reports the number of processors on the node object.
- Ascend Device Plugin periodically reports the processor topology information.
  - Report the entire NPU information. The physical ID of the processor is reported to device-info-cm. The total number of allocatable processors, number of allocated processors, and basic processor information (device ip and super_device_ip) are reported to the node for full NPU scheduling.
  - Report vNPU information to the node for static vNPU scheduling.
- When a node is faulty, NodeD periodically reports the node health status, node hardware fault information, and node DPC shared storage fault information to node-info-cm.
After reading the information in device-info-cm and node-info-cm, ClusterD writes the information to cluster-info-cm.
A user delivers an acjob through kubectl or other deep learning platforms.
Ascend Operator creates a PodGroup for the job. For details about PodGroup, see the Volcano open source official document.
Ascend Operator creates a pod for the job and injects environment variables required for collective communication into the container.
volcano-scheduler selects a proper node for the job based on the node and processor topology information and writes the selected processor information to the annotation of the pod.
- Write the entire NPU information for full NPU scheduling.
- Write vNPU information for static vNPU scheduling.
When kubelet is used to create a container, Ascend Device Plugin is called to mount the processor. Ascend Device Plugin or volcano-scheduler writes the processor information to the annotation of the pod. Ascend Docker Runtime assists in mounting the corresponding resource.
Ascend Operator reads the annotation information of the pod and writes the information to hccl.json.
The container reads environment variables or hccl.json information, establishes a communication channel, and starts to execute the training job.

vcjob

Figure 2 shows the principle of vcjob.

Figure 2 vcjob scheduling principle

The description of each step is as follows:

Cluster scheduling components periodically report node and processor information. kubelet reports the number of processors on the node object.
- Ascend Device Plugin periodically reports the processor topology information.
  - Report the entire NPU information. The physical ID of the processor is reported to device-info-cm. The total number of allocatable processors and the number of allocated processors are reported to the node for full NPU scheduling.
  - Report vNPU information to the node for static vNPU scheduling.
- When a node is faulty, NodeD periodically reports the node health status, node hardware fault information, and node DPC shared storage fault information to node-info-cm.
After reading the information in device-info-cm and node-info-cm, ClusterD writes the information to cluster-info-cm.
A user delivers a vcjob through kubectl or other deep learning platforms.
volcano-controller creates a PodGroup for the job. For details about PodGroup, see the Volcano open source official document.
volcano-controller creates a pod for the job when cluster resources meet the job requirements.
volcano-scheduler selects a proper node for the job based on the node and processor topology information and writes the selected processor information to the annotation of the pod.
- Write the entire NPU information for full NPU scheduling.
- Write vNPU information for static vNPU scheduling.
When kubelet is used to create a container, Ascend Device Plugin is called to mount the processor. Ascend Device Plugin writes the processor information to the annotation of the pod.. Ascend Docker Runtime assists in mounting the resources and mounts the hccl.json file to the container.
Ascend Operator obtains the annotation information of each pod and writes the information to hccl.json.
The container reads the hccl.json file, establishes a communication channel, and starts to execute the training job.

deploy Job

Figure 3 shows the principle of deploy jobs.

Figure 3 deploy scheduling principle

The description of each step is as follows:

Cluster scheduling components periodically report node and processor information. kubelet reports the number of processors on the node object.
- Ascend Device Plugin periodically reports the processor topology information.
  - Report the entire NPU information. The physical ID of the processor is reported to device-info-cm. The total number of allocatable processors and the number of allocated processors are reported to the node for full NPU scheduling.
  - Report vNPU information to the node for static vNPU scheduling.
- When a node is faulty, NodeD periodically reports the node health status, node hardware fault information, and node DPC shared storage fault information to node-info-cm.
After reading the information in device-info-cm and node-info-cm, ClusterD writes the information to cluster-info-cm.
A user delivers a deploy job through kubectl or other deep learning platforms.
kube-controller creates a pod for the job.
volcano-controller creates a PodGroup for the job. For details about PodGroup, see the Volcano open source official document.
volcano-scheduler selects a proper node for the job based on the node and processor topology information and writes the selected processor information to the annotation of the pod.
- Write the entire NPU information for full NPU scheduling.
- Write vNPU information for static vNPU scheduling.
When kubelet is used to create a container, Ascend Device Plugin is called to mount the processor. Ascend Device Plugin writes the processor information to the annotation of the pod. Ascend Docker Runtime assists in mounting the resources and mounts the hccl.json file to the container.
Ascend Operator obtains the annotation information of each pod and writes the information to hccl.json.
The container reads the hccl.json file, establishes a communication channel, and starts to execute the training job.

Parent topic: Full NPU Scheduling/Static vNPU Scheduling (Training)