NPU Exporter
This section describes how to check whether NPU Exporter is running properly by interconnecting with Prometheus and reporting Prometheus data.
Deployment Using Containers
Perform the following steps on any node to verify the installation status of NPU Exporter:
- Run the following command to check the NPU Exporter pod in a Kubernetes cluster. Ensure that STATUS of the pod is Running and READY is 1/1. If NPU Exporter is installed on multiple nodes in a cluster, confirm the pod status one by one.
kubectl get pods -n npu-exporter -o wide | grep npu-exporter
Command output:
1npu-exporter-4ln8w 1/1 Running 0 36m 192.168.102.109 ubuntu <none> <none>
- View NPU Exporter logs in a Kubernetes cluster.
kubectl logs -n npu-exporter{Name_of_the_NPU Exporter's pod}Command output:1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
[INFO] 2023/12/08 07:38:56.551173 1 hwlog/api.go:108 npu-exporter.log's logger init success [INFO] 2023/12/08 07:38:56.551275 1 npu-exporter/main.go:205 listen on: 0.0.0.0 [INFO] 2023/12/08 07:38:56.551369 1 npu-exporter/main.go:325 npu exporter starting and the version is v7.3.0_linux-x86_64 [WARN] 2023/12/08 07:38:56.684424 1 npu-exporter/main.go:339 enable unsafe http server [WARN] 2023/12/08 07:39:01.686205 98 container/runtime_ops.go:150 failed to get OCI connection: context deadline exceeded [WARN] 2023/12/08 07:39:01.686311 98 container/runtime_ops.go:152 use backup address to try again [INFO] 2023/12/08 07:39:01.687444 98 collector/npu_collector.go:418 Starting update cache every 5 seconds [WARN] 2023/12/08 07:39:01.688039 157 collector/npu_collector.go:463 get info of npu-exporter-network-info failed: no value found, so use initial net info [INFO] 2023/12/08 07:39:01.744739 157 collector/npu_collector.go:476 update cache,key is npu-exporter-network-info [INFO] 2023/12/08 07:39:01.852413 158 collector/npu_collector.go:499 update cache,key is npu-exporter-containers-devices [INFO] 2023/12/08 07:39:05.055247 148 collector/npu_collector.go:442 update cache,key is npu-exporter-npu-list [INFO] 2023/12/08 07:39:06.688352 157 collector/npu_collector.go:476 update cache,key is npu-exporter-network-info [INFO] 2023/12/08 07:39:06.750876 158 collector/npu_collector.go:499 update cache,key is npu-exporter-containers-devices [INFO] 2023/12/08 07:39:09.843914 148 collector/npu_collector.go:442 update cache,key is npu-exporter-npu-list [INFO] 2023/12/08 07:39:11.688505 157 collector/npu_collector.go:476 update cache,key is npu-exporter-network-info [INFO] 2023/12/08 07:39:11.701081 158 collector/npu_collector.go:499 update cache,key is npu-exporter-containers-devices [INFO] 2023/12/08 07:39:14.859243 148 collector/npu_collector.go:442 update cache,key is npu-exporter-npu-list ...
Deployment using Binary Files
Perform the following steps on the node where NPU Exporter is installed to verify the installation status:
- Log in to the node where NPU Exporter is deployed and run the following command to check the component status. Ensure that the component status is active (running).
systemctl status npu-exporter
Information similar to the following is displayed.
1 2 3 4 5 6 7 8 9
root@ubuntu:~# systemctl status npu-exporter ● npu-exporter.service - Ascend npu exporter Loaded: loaded (/etc/systemd/system/npu-exporter.service; enabled; vendor preset: enabled) Active: active (running) since Thu 2022-11-17 16:24:41 CST; 3 days ago Main PID: 25121 (npu-exporter) Tasks: 8 (limit: 7372) CGroup: /system.slice/npu-exporter.service └─25121 /usr/local/bin/npu-exporter -ip=127.0.0.1 -port=8082 -logFile=/var/log/mindx-dl/npu-exporter/npu-exporter.log ...
- View component logs.
cat /var/log/mindx-dl/npu-exporter/npu-exporter.log
Command output:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
[INFO] 2023/12/08 07:38:56.551173 1 hwlog/api.go:108 npu-exporter.log's logger init success [INFO] 2023/12/08 07:38:56.551275 1 npu-exporter/main.go:205 listen on: 0.0.0.0 [INFO] 2023/12/08 07:38:56.551369 1 npu-exporter/main.go:325 npu exporter starting and the version is v7.3.0_linux-x86_64 [WARN] 2023/12/08 07:38:56.684424 1 npu-exporter/main.go:339 enable unsafe http server [WARN] 2023/12/08 07:39:01.686205 98 container/runtime_ops.go:150 failed to get OCI connection: context deadline exceeded [WARN] 2023/12/08 07:39:01.686311 98 container/runtime_ops.go:152 use backup address to try again [INFO] 2023/12/08 07:39:01.687444 98 collector/npu_collector.go:418 Starting update cache every 5 seconds [WARN] 2023/12/08 07:39:01.688039 157 collector/npu_collector.go:463 get info of npu-exporter-network-info failed: no value found, so use initial net info [INFO] 2023/12/08 07:39:01.744739 157 collector/npu_collector.go:476 update cache,key is npu-exporter-network-info [INFO] 2023/12/08 07:39:01.852413 158 collector/npu_collector.go:499 update cache,key is npu-exporter-containers-devices [INFO] 2023/12/08 07:39:05.055247 148 collector/npu_collector.go:442 update cache,key is npu-exporter-npu-list [INFO] 2023/12/08 07:39:06.688352 157 collector/npu_collector.go:476 update cache,key is npu-exporter-network-info [INFO] 2023/12/08 07:39:06.750876 158 collector/npu_collector.go:499 update cache,key is npu-exporter-containers-devices [INFO] 2023/12/08 07:39:09.843914 148 collector/npu_collector.go:442 update cache,key is npu-exporter-npu-list [INFO] 2023/12/08 07:39:11.688505 157 collector/npu_collector.go:476 update cache,key is npu-exporter-network-info [INFO] 2023/12/08 07:39:11.701081 158 collector/npu_collector.go:499 update cache,key is npu-exporter-containers-devices [INFO] 2023/12/08 07:39:14.859243 148 collector/npu_collector.go:442 update cache,key is npu-exporter-npu-list ...
Parent topic: Confirming Component Status