NPU Exporter Cannot Obtain the Current Container Information After kubelet Is Restarted

Symptom

After the user restarts kubelet, NPU Exporter does not report container information.

Cause Analysis

After kubelet is restarted, the new dockershim.sock file is created. However, NPU Exporter obtains the old dockershim.sock file and cannot obtain the current container data.

Solution

Use either of the following methods to rectify the fault:

  • Method 1: No manual operation is required.

    After kubelet is restarted, the container automatically exits. After Kubernetes is used to restart the container, NPU Exporter can obtain the container information. During the container startup (about 10s), all NPU Exporter data will be lost.

  • Method 2: Run the following command to manually delete the container when kubelet is restarted:
    kubectl delete pod -n npu-exporter <npu-exporter-podname>

    After the container is deleted, the container is restarted (about 10s), and all NPU Exporter data is lost during the period.

  • Method 3: Mount the directory of the dockershim.sock file.
    1. Run the following command in the directory where the NPU Exporter startup YAML file is stored to open the YAML file:
      vi npu-exporter-v{version}.yaml
    2. Delete the following mount paths from the NPU Exporter startup YAML file:
      ...
              volumeMounts:
                - name: log-npu-exporter
      ...
                - name: sys
                  mountPath: /sys
                  readOnly: true
                - name: docker-shim      # Delete the following fields in bold:
                  mountPath: /var/run/dockershim.sock
                  readOnly: true
                - name: docker 
                  mountPath: /var/run/docker
                  readOnly: true
                - name: cri-dockerd 
                  mountPath: /var/run/cri-dockerd.sock
                  readOnly: true
                - name: containerd  # delete when only use isula
                  mountPath: /run/containerd
      ...
            volumes:
              - name: log-npu-exporter
      ...
              - name: sys
                hostPath:
                  path: /sys
              - name: docker-shim # Delete the following fields in bold:
                hostPath:   
                  path: /var/run/dockershim.sock
              - name: docker 
                hostPath:
                  path: /var/run/docker
              - name: cri-dockerd 
                hostPath:
                  path: /var/run/cri-dockerd.sock
              - name: containerd  
                hostPath:
                  path: /run/containerd
       ...
    3. In the NPU Exporter startup YAML file, add the mount directory of the dockershim.sock file.
      ...
              volumeMounts:
                - name: log-npu-exporter
      ...
                - name: sys
                  mountPath: /sys
                  readOnly: true
                - name: sock
                  mountPath: /var/run         # Use the actual dockershim.sock file directory.
      ...
            volumes:
              - name: log-npu-exporter
      ...
              - name: sys
                hostPath:
                  path: /sys
              - name: sock
                hostPath:
                  path: /var/run # Use the actual dockershim.sock file directory.
              - name: containerd  
                hostPath:
                  path: /run/containerd
      ...