Working Principles

The description of each step is as follows:
  1. The cluster scheduling components periodically report node and processor information. kubelet reports the number of processors on a node to the node object.
    • Ascend Device Plugin reports the processor memory and topology information.

      For a processor with on-chip memory, upon startup, Ascend Device Plugin reports the processor memory details (node-label), the entire NPU information and physical processor ID to device-info-cm, and the total number of schedulable processors (allocatable), number of used processors (allocated), and basic processor information (device ip and super_device_ip) to the node for entire NPU scheduling.

    • When a node is faulty, NodeD periodically reports the node health status, node hardware fault information, and node DPC shared storage fault information to node-info-cm.
  2. After reading the information in device-info-cm and node-info-cm, ClusterD integrates the information into cluster-info-cm.
  3. The kubectl or other deep learning platforms can be used to deliver MS Controller and MS Coordinator jobs without requiring NPUs, as well as MindIE Server jobs that utilize NPUs.
  4. Ascend Operator creates a podGroup for the job. For details about podGroup, see the open source Volcano documentation.
  5. Ascend Operator creates a pod for the job and injects the environment variables required for starting the MindIE Server service. For details about the environment variables, see Table 2 Training environment variables injected by Ascend Operator.
  6. For MS Controller and MS Coordinator jobs, volcano-scheduler selects a proper node based on the node memory, CPU, label, and affinity. For MindIE Server jobs, volcano-scheduler additionally considers the processor topology information when selecting a proper node, and writes the selected processor information along with node hardware data in the pod annotation.
  7. When kubelet creates a container, Ascend Device Plugin is called to mount processors for MindIE Server tasks, and Ascend Device Plugin or volcano-scheduler writes the processor and node hardware information to the pod annotation. Ascend Docker Runtime assists in mounting corresponding resources.
  8. Ascend Operator reads the annotation information of each MindIE Server job pod, generates the corresponding collective communication file hccl.json, and stores the file in etcd as a ConfigMap.
  9. ClusterD listens to the information of the MS Controller and MS Coordinator job pods and the changes of the ConfigMap corresponding to each hccl.json file, and generates global-ranktable in real time. For details about global-ranktable, see global-ranktable Description.
  10. After startup, MS Controller establishes communication with ClusterD and subscribes to global-ranktable changes through the gRPC interface.