Task Pending While Allocating an Eight-Card Job When the Number of Chips Corresponding to Allocatable.huawei.com/Ascend910 in the Node Information Is 8
Symptom
Run the kubectl describe node {node name} command to view the node information. The number of chips corresponding to Allocatable.huawei.com/Ascend910 is 8. An Eight-Card Job is allocated, and the task is in pending status.
Capacity: cpu: 72 ephemeral-storage: 1843598940Ki huawei.com/Ascend910: 8 hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 659447564Ki pods: 110 Allocatable: cpu: 72 ephemeral-storage: 1699060780291 huawei.com/Ascend910: 8 hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 659345164Ki pods: 110
Cause Analysis
Common faults that are not detected by Ascend Device Plugin may exist on the node.
Solution
Rectify the fault by referring to Obtaining Information About Available Devices in a Cluster.
Parent topic: Faults During Use