Job Progress Viewing

Procedure

  1. Run the following command to check the pod running status:
    kubectl get pod --all-namespaces

    Command output:

    1
    2
    3
    4
    NAMESPACE        NAME                                       READY   STATUS    RESTARTS   AGE
    ...
    default          resnetinfer1-2-scpr5                      1/1     Running   0          8s
    ...
    
  2. View details about the node where the inference job is running.
    1. View the node name.
      kubectl get node -A
    2. View the node details based on the node name obtained in the previous step.
      kubectl describe node <nodename>

      Command output:

       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      ...
      Allocated resources:
        (Total limits may be over 100 percent, i.e., overcommitted.)
        Resource              Requests     Limits
        --------              --------     ------
        cpu                   4 (2%)       3500m (1%)
        memory                2140Mi (0%)  4040Mi (0%)
        ephemeral-storage     0 (0%)       0 (0%)
        huawei.com/npu-core  4            4
      Events:
        Type    Reason    Age   From                Message
        ----    ------    ----  ----                -------
        Normal  Starting  36m   kube-proxy, ubuntu  Starting kube-proxy.
      ...
      

      In the displayed information, find huawei.com/npu-core under Allocated resources. The value of this parameter increases after the inference job is executed. The increased number is the number of NPUs used by the inference job.