Working Principles

  1. The cluster scheduling components periodically report node and processor information. kubelet reports the number of processors on a node to the node object.
    • Ascend Device Plugin reports the processor memory and topology information.

      For a processor with on-chip memory, upon startup, Ascend Device Plugin reports the processor memory details (node-label), the entire NPU information and physical processor ID to device-info-cm, and the total number of schedulable processors (allocatable), number of used processors (allocated), and basic processor information (device ip and super_device_ip) to the node for entire NPU scheduling.

    • When a node is faulty, NodeD periodically reports the node health status, node hardware fault information, and node DPC shared storage fault information to node-info-cm.
  2. After reading the information in device-info-cm and node-info-cm, ClusterD integrates the information into cluster-info-cm.
  3. Use kubectl or other deep learning platforms to deliver SGLang inference jobs of OME. OME generates sub-workloads of Deployment or LeaderWorkerSet (LWS) based on the inference job configuration, and then the corresponding sub-workloads generate multiple inference job pods. For details about Deployment or LeaderWorkerSet, see OME documentation.
  4. volcano-controller or LeaderWorkerSet creates a PodGroup for the job. For details about podGroup, see the open source Volcano documentation.
  5. For an SGLang inference job pod, volcano-scheduler selects a proper node based on memory, CPU, labels, and affinity. It also considers processor topology, recording the chosen processor and node hardware details in the pod annotation.
  6. When kubelet creates a container, for an SGLang inference job deployed based on OME, Ascend Device Plugin is called to mount processors, and Ascend Device Plugin or volcano-scheduler writes the processor and node hardware information into the pod annotation. Ascend Docker Runtime assists in mounting corresponding resources.