Working Principles

The cluster scheduling components periodically report node and processor information. kubelet reports the number of processors on a node to the node object.
- Ascend Device Plugin reports the processor memory and topology information.
  For a processor with on-chip memory, Ascend Device Plugin reports its memory details (node-label) upon startup, the entire NPU information and physical processor ID to device-info-cm, and the total number of schedulable processors (allocatable), number of used processors (allocated), and basic processor information (device ip and super_device_ip) to the node for entire NPU scheduling.
- When a node is faulty, NodeD periodically reports the node health status, node hardware fault information, and node DPC shared storage fault information to node-info-cm.
After reading the information in device-info-cm and node-info-cm, ClusterD integrates the information into cluster-info-cm.
Use kubectl or other deep learning platforms to StormService inference jobs of AIBrix. aibrix-controller-manager generates sub-workloads of RoleSet or PodSet based on the inference job configuration, and then the corresponding sub-workloads generate multiple inference job pods. For details about RoleSet or PodSet, see AIBrix documentation.
volcano-controller creates a PodGroup for the job. For details about PodGroup, see the open-source Volcano documentation.
volcano-scheduler selects a proper node for the pod based on the node memory, CPU, label, and affinity, and writes the selected processor information and node hardware information to the pod annotation.
When kubelet creates a container, Ascend Device Plugin is called to mount processors. Ascend Device Plugin or volcano-scheduler writes the processor and node hardware information to the pod annotation. Ascend Docker Runtime assists in mounting corresponding resources.

Parent topic: Deploying vLLM Inference Jobs