NodeD
请在任意节点执行以下步骤验证NodeD的安装状态。
- 通过如下命令查看K8s集群中NodeD的Pod,需要满足Pod的STATUS为Running,READY为1/1。如果集群中有多个节点安装了NodeD,每个节点都需要确认。
kubectl get pods -n mindx-dl -o wide | grep noded
回显示例:
1
noded-bnmwt 1/1 Running 10 40d 192.168.41.28 ubuntu <none> <none>
- 通过如下命令查看NodeD组件日志。
kubectl logs -n mindx-dl {NodeD组件的Pod名字}
回显示例如下,表示组件正常运行。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
[root@***** clusterD]# kubectl logs -f -n mindx-dl noded-ncdk4 [INFO] 2025/05/25 15:24:19.897280 1 hwlog/api.go:108 noded.log's logger init success [INFO] 2025/05/25 15:24:19.897392 1 noded/main.go:93 noded starting and the version is v7.1.RC1_linux-x86_64 W0525 15:24:19.897410 1 client_config.go:617] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work. [INFO] 2025/05/25 15:24:19.994306 1 devmanager/devmanager.go:123 the dcmi version is 24.1.rc3.b060 [INFO] 2025/05/25 15:24:19.994360 1 devmanager/devmanager.go:1071 get chip base info, cardID: 0, deviceID: 0, logicID: 0, physicID: 0 [INFO] 2025/05/25 15:24:19.994386 1 devmanager/devmanager.go:1071 get chip base info, cardID: 1, deviceID: 0, logicID: 1, physicID: 1 [INFO] 2025/05/25 15:24:19.994408 1 devmanager/devmanager.go:1071 get chip base info, cardID: 2, deviceID: 0, logicID: 2, physicID: 2 [INFO] 2025/05/25 15:24:19.994430 1 devmanager/devmanager.go:1071 get chip base info, cardID: 3, deviceID: 0, logicID: 3, physicID: 3 [INFO] 2025/05/25 15:24:19.994449 1 devmanager/devmanager.go:1071 get chip base info, cardID: 4, deviceID: 0, logicID: 4, physicID: 4 [INFO] 2025/05/25 15:24:19.994476 1 devmanager/devmanager.go:1071 get chip base info, cardID: 5, deviceID: 0, logicID: 5, physicID: 5 [INFO] 2025/05/25 15:24:19.994505 1 devmanager/devmanager.go:1071 get chip base info, cardID: 6, deviceID: 0, logicID: 6, physicID: 6 [INFO] 2025/05/25 15:24:19.994528 1 devmanager/devmanager.go:1071 get chip base info, cardID: 7, deviceID: 0, logicID: 7, physicID: 7 [WARN] 2025/05/25 15:24:19.994564 1 executor/dev_manager.go:71 deviceManager get hccsPingMeshState failed, err: dcmi get hccs ping mesh state failed cardID(0) deviceID(0) error code: -99998 [ERROR] 2025/05/25 15:24:19.994588 1 pingmesh/controller.go:68 new device manager failed, err: dcmi get hccs ping mesh state failed cardID(0) deviceID(0) error code: -99998 [INFO] 2025/05/25 15:24:19.999314 1 config/configurator.go:98 update fault config success [INFO] 2025/05/25 15:24:19.999350 1 config/configurator.go:231 init fault config from config map success [INFO] 2025/05/25 15:24:39.037815 1 control/controller.go:220 get node SN success, add SN(HS20200764) to node annotation ...
父主题: 组件状态确认