Ascend Device Plugin

Perform the following steps on any node to verify the installation status of Ascend Device Plugin:

Procedure

  1. Run the following command to check the Ascend Device Plugin pod in a Kubernetes cluster. Ensure that STATUS of the pod is Running and READY is 1/1. If the Ascend Device Plugin is installed on multiple nodes in a cluster, you need to confirm the pod on each node.
    kubectl get pods -n kube-system -o wide | grep device-plugin

    Information similar to the following is displayed.

    1
    ascend-device-plugin-daemonset-910-85p9v   1/1     Running   0          19h     192.168.185.251   ubuntu       <none>           <none>
    
  2. View the logs of Ascend Device Plugin in a Kubernetes cluster.
    kubectl logs -n kube-system {Name_of_the_Ascend Device Plugin's_pod}

    If the following information is displayed, the component is normal.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    root@ubuntu:~# kubectl logs -n kube-system ascend-device-plugin-daemonset-910-85p9v 
    [INFO]     2022/11/21 11:20:04.534992 1       hwlog@v0.0.0/api.go:96    devicePlugin.log's logger init success
    [INFO]     2022/11/21 11:20:04.535750 1       main.go:127    ascend device plugin starting and the version is xxx_linux-x86_64
    [INFO]     2022/11/21 11:20:05.992823 1       K8stool@v0.0.0/self_K8s_client.go:116    start to decrypt cfg
    [INFO]     2022/11/21 11:20:06.002773 1       K8stool@v0.0.0/self_K8s_client.go:125    Config loaded from file: ****tc/mindx-dl/device-plugin/.config/config6
    [INFO]     2022/11/21 11:20:06.003751 1       main.go:153    init kube client success 
    [INFO]     2022/11/21 11:20:06.003923 1       device/ascendcommon.go:104    Found Huawei Ascend, deviceType: Ascend910, deviceName: Ascend910-4
    [INFO]     2022/11/21 11:20:06.003970 1       main.go:160    init device manager success
    [INFO]     2022/11/21 11:20:06.004157 21      device/manager.go:125    starting the listen device
    [INFO]     2022/11/21 11:20:06.004285 7       device/manager.go:206    Serve start
    [INFO]     2022/11/21 11:20:06.004970 7       server/server.go:88    device plugin (Ascend910) start serving.
    [INFO]     2022/11/21 11:20:06.007285 7       server/server.go:36    register Ascend910 to kubelet success.
    [INFO]     2022/11/21 11:20:06.007521 7       server/pod_resource.go:44    pod resource client init success.
    [INFO]     2022/11/21 11:20:06.007755 35      server/plugin.go:87    ListAndWatch resp devices: Ascend910-4 Healthy# Processor reported to Kubernetes. The actual processor prevails.
    [INFO]     2022/11/21 11:20:11.063218 21      kubeclient/client_server.go:123    reset annotation success
    ...
    
  3. Run the following command to view details about nodes in a Kubernetes cluster. If the Capacity and Allocatable fields in the node details contain information about Ascend AI processors, Ascend Device Plugin reports processor information to Kubernetes and operates normally.
    kubectl describe node {Node_name_in_a_Kubernetes_cluster}
    • Take an Atlas 800 training server as an example. The command output is as follows:
       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      root@ubuntu:~# kubectl describe node ubuntu
      Name:               ubuntu
      Roles:              worker
      Labels:             accelerator=huawei-Ascend910
                          beta.kubernetes.io/arch=amd64
      ...
      CreationTimestamp:  Wed, 22 Dec 2021 20:10:04 +0800
      Taints:             <none>
      Unschedulable:      false
      ...
      Capacity:
        cpu:                      72
        ephemeral-storage:        479567536Ki
        huawei.com/Ascend910:     8  # Kubernetes has detected that the node has eight NPUs.
      ...
      Allocatable:
        cpu:                      72
        ephemeral-storage:        441969440446
        huawei.com/Ascend910:     8 # Kubernetes has detected that a total of eight NPUs can be allocated on the node.
      ...
      
    • The following uses a server (equipped with Atlas 300I inference cards) as an example. The number of processors on the node varies according to the actual situation.
       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      root@ubuntu:~# kubectl describe node ubuntu
      Name:               ubuntu
      Roles:              worker
      Labels:             accelerator=huawei-Ascend310
                          beta.kubernetes.io/arch=amd64
      ...
      CreationTimestamp:  Wed, 22 Dec 2021 20:10:04 +0800
      Taints:             <none>
      Unschedulable:      false
      ...
      Capacity:
        cpu:                       72
        ephemeral-storage:         163760Mi
        huawei.com/Ascend310:      4
      ...
      Allocatable:
        cpu:                       72
        ephemeral-storage:         154543324929
        huawei.com/Ascend310:      4
      ...
      
    • The following uses a server (equipped with Atlas 300I Pro inference cards) as an example. In non-mixed insertion mode, if the node contains Atlas inference product, the following information is displayed. The number of processors on the node varies according to the actual situation.
       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      root@ubuntu:~# kubectl describe node ubuntu
      Name:               ubuntu
      Roles:              worker
      Labels:             accelerator=huawei-Ascend310
                          beta.kubernetes.io/arch=amd64
      ...
      CreationTimestamp:  Wed, 22 Dec 2021 20:10:04 +0800
      Taints:             <none>
      Unschedulable:      false
      ...
      Capacity:
        cpu:                      96
        ephemeral-storage:        95596964Ki
        huawei.com/Ascend310P:    3
      ...
      Allocatable:
        cpu:                      96
        ephemeral-storage:        88102161877
        huawei.com/Ascend310P:    3
      ...
      
    • The following uses a server (with Atlas 300I Pro inference cards; mixed insertion mode) is used as an example. The node contains Atlas inference product. The number of processors on the node varies according to the actual situation.
       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      root@ubuntu:~# kubectl describe node ubuntu
      Name:               ubuntu
      Roles:              worker
      Labels:             accelerator=huawei-Ascend310
                          beta.kubernetes.io/arch=amd64
      ...
      CreationTimestamp:  Wed, 22 Dec 2021 20:10:04 +0800
      Taints:             <none>
      Unschedulable:      false
      ...
      Capacity:
        cpu:                      96
        ephemeral-storage:        95596964Ki
        huawei.com/Ascend310P-IPro:    1
        huawei.com/Ascend310P-V:       1
        huawei.com/Ascend310P-VPro:    1
      ...
      Allocatable:
        cpu:                      96
        ephemeral-storage:        88102161877
        huawei.com/Ascend310P-IPro:    1
        huawei.com/Ascend310P-V:       1
        huawei.com/Ascend310P-VPro:    1
      ...