systemctl status npu-exporter
回显示例:
root@ubuntu:~# systemctl status npu-exporter ● npu-exporter.service - Ascend npu exporter Loaded: loaded (/etc/systemd/system/npu-exporter.service; enabled; vendor preset: enabled) Active: active (running) since Thu 2022-11-17 16:24:41 CST; 3 days ago Main PID: 25121 (npu-exporter) Tasks: 8 (limit: 7372) CGroup: /system.slice/npu-exporter.service └─25121 /usr/local/bin/npu-exporter -ip=127.0.0.1 -port=8082 -logFile=/var/log/mindx-dl/npu-exporter/npu-exporter.log ...
cat /var/log/mindx-dl/npu-exporter/npu-exporter.log
回显示例:
root@ubuntu:/usr/local/bin# cat /var/log/mindx-dl/npu-exporter/npu-exporter.log [INFO] 2022/10/25 17:01:18.610431 1 hwlog@v0.0.10/api.go:96 npu-exporter.log's logger init success [INFO] 2022/10/25 17:01:18.610628 1 npu-exporter/main.go:275 listen on: 0.0.0.0 [INFO] 2022/10/25 17:01:18.610740 1 npu-exporter/main.go:112 npu exporter starting and the version is v5.0.RC1_linux-aarch64 ... [ERROR] 2022/10/25 17:01:24.191525 34 container/runtime_ops.go:91 failed to get OCI connection [ERROR] 2022/10/25 17:01:24.191736 34 container/runtime_ops.go:93 try again [INFO] 2022/10/25 17:01:24.193024 34 collector/npu_collector.go:166 Starting update cache every 5 seconds [INFO] 2022/10/25 17:01:29.315194 34 collector/npu_collector.go:178 update cache,key is npu-exporter-npu-list [INFO] 2022/10/25 17:01:29.315407 34 collector/npu_collector.go:183 update cache,key is npu-exporter-containers-devices [INFO] 2022/10/25 17:01:34.302792 34 collector/npu_collector.go:178 update cache,key is npu-exporter-npu-list [INFO] 2022/10/25 17:01:34.302983 34 collector/npu_collector.go:183 update cache,key is npu-exporter-containers-devices ...
如果持续出现如下打印信息,表示组件运行正常。
... [INFO] 2022/10/25 17:01:29.315194 34 collector/npu_collector.go:178 update cache,key is npu-exporter-npu-list [INFO] 2022/10/25 17:01:29.315407 34 collector/npu_collector.go:183 update cache,key is npu-exporter-containers-devices ...
此时如果发现之前的日志中有如下内容可忽略。
[ERROR] 2022/10/25 17:01:24.191525 34 container/runtime_ops.go:91 failed to get OCI connection [ERROR] 2022/10/25 17:01:24.191736 34 container/runtime_ops.go:93 try again
kubectl get pods -n npu-exporter -o wide
回显示例:
root@ubuntu:~# kubectl get pods -n npu-exporter -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES npu-exporter-4ln8w 1/1 Running 0 36m 192.168.102.109 ubuntu <none> <none> ...
kubectl logs -n npu-exporter {npu-exporter组件的Pod名字}
回显示例:
root@ubuntu:~# kubectl logs -n npu-exporter npu-exporter-dq24k [INFO] 2022/10/25 17:01:18.610431 1 hwlog@v0.0.10/api.go:96 npu-exporter.log's logger init success [INFO] 2022/10/25 17:01:18.610628 1 npu-exporter/main.go:275 listen on: 0.0.0.0 [INFO] 2022/10/25 17:01:18.610740 1 npu-exporter/main.go:112 npu exporter starting and the version is v5.0.RC1_linux-aarch64 ... [ERROR] 2022/10/25 17:01:24.191525 34 container/runtime_ops.go:91 failed to get OCI connection [ERROR] 2022/10/25 17:01:24.191736 34 container/runtime_ops.go:93 try again [INFO] 2022/10/25 17:01:24.193024 34 collector/npu_collector.go:166 Starting update cache every 5 seconds [INFO] 2022/10/25 17:01:29.315194 34 collector/npu_collector.go:178 update cache,key is npu-exporter-npu-list [INFO] 2022/10/25 17:01:29.315407 34 collector/npu_collector.go:183 update cache,key is npu-exporter-containers-devices [INFO] 2022/10/25 17:01:34.302792 34 collector/npu_collector.go:178 update cache,key is npu-exporter-npu-list [INFO] 2022/10/25 17:01:34.302983 34 collector/npu_collector.go:183 update cache,key is npu-exporter-containers-devices ...
如果持续出现如下打印信息,表示组件运行正常。
... [INFO] 2022/10/25 17:01:29.315194 34 collector/npu_collector.go:178 update cache,key is npu-exporter-npu-list [INFO] 2022/10/25 17:01:29.315407 34 collector/npu_collector.go:183 update cache,key is npu-exporter-containers-devices ...
此时如果发现之前的日志中有如下内容可忽略。
[ERROR] 2022/10/25 17:01:24.191525 34 container/runtime_ops.go:91 failed to get OCI connection [ERROR] 2022/10/25 17:01:24.191736 34 container/runtime_ops.go:93 try again