NodeD
NodeD collects node hardware fault and health information and stores it as external information in Kubernetes ConfigMap for external query and use.
The query command is kubectl describe cm mindx-dl-nodeinfo-<nodename> -n mindx-dl. The command output is as follows. For details about key parameters, see Table 1.
Name: mindx-dl-nodeinfo-<nodename>
Namespace: mindx-dl
Labels: <none>
Annotations: <none>
Data
====
NodeInfo:
----
{"NodeInfo":{"FaultDevList":[{"DeviceType":"CPU","DeviceId":1,"FaultCode":["00000011"],"FaultLevel":"SeparateFault"}],"NodeStatus":"UnHealthy"},"CheckCode":"3a2934c3cb875f2256c770c75a6fdf24594fcf64481ac6cd0d0f74b8fea88855"}
Events: <none>
Parameter |
Description |
|---|---|
NodeInfo |
Node fault information |
FaultDevList |
List of faulty devices on a node |
- DeviceType |
Faulty device type |
- DeviceId |
ID of the faulty device |
- FaultCode |
Fault code, a string of characters (hexadecimal) consisted by English characters and numbers. |
- FaultLevel |
Fault handling level
|
NodeStatus |
Node health status, which is determined by the device with the highest fault handling level on the node.
|
CheckCode |
Verification code. |
Parent topic: Query the Reported Fault Information