(Optional) Configuring Node Hardware Fault Levels
The fault level configuration file NodeDConfiguration.json is embedded in the NodeD image. When NodeD is started, it reads the default configurations of this file for fault handling.
If you want to customize fault levels, you can create a ConfigMap file (mindx-dl-node-fault-config) in the cluster.
- If mindx-dl-node-fault-config already exists in the cluster when NodeD is started, NodeD preferentially uses the content configured in the file for fault handling.
- If mindx-dl-node-fault-config already exists in the cluster after NodeD is reinstalled, the default NodeDConfiguration.json file of NodeD does not take effect and mindx-dl-node-fault-config is used. If you want to use the default configuration of NodeDConfiguration.json, delete mindx-dl-node-fault-config so that NodeD reads the default NodeDConfiguration.json file.
- If the format of mindx-dl-node-fault-config is incorrect, NodeD reads the content of the NodeDConfiguration.json file built in the image by default for fault handling.
Procedure
The following uses the fault with code 0100001D as an example to describe how to change the fault handling policy from NotHandleFault to PreSeparateFault.
- Log in to the environment and go to the directory generated after NodeD decompression.
- Run the following command to create the ConfigMap file (mindx-dl-node-fault-config) required for dynamically configuring fault levels:
kubectl create cm mindx-dl-node-fault-config -n mindx-dl --from-file=./NodeDConfiguration.json
Command output:1configmap/mindx-dl-node-fault-config created
Table 1 Parameter description Parameter
Description
mindx-dl-node-fault-config
Name of the created ConfigMap file, which cannot be changed.
mindx-dl
Namespace name, which cannot be modified.
NodeDConfiguration.json
Used to configure the fault code and corresponding fault level. The value must be the same as that configured in the NodeDConfiguration.json file.
- Run the following command to edit the mindx-dl-node-fault-config file:
kubectl edit cm -n mindx-dl mindx-dl-node-fault-config
- Find the fault code 0100001D in the mindx-dl-node-fault-config file.
"FaultTypeCode": { "NotHandleFaultCodes":[ "0100001D","03000009","03000013","0300000D","03000011" ], ... ], ...
If any of the following problems occurs during fault level customization, the modification does not take effect and NodeD will use the configuration saved last time.- The file format or fault code is incorrect. The fault code must be a string of eight characters, including digits and letters.
- A fault code is configured for multiple fault levels.
- Delete the fault code 0100001D from NotHandleFaultCodes and add it to PreSeparateFaultCodes.
"FaultTypeCode": { "NotHandleFaultCodes":[ "03000009","03000013","0300000D","03000011" ], "PreSeparateFaultCodes":[ "28000037","00000011", "0100001D" ... ], ... - After the modification, press Esc, enter :wq!, save the configuration, and exit.
- After the mindx-dl-node-fault-config file is updated, check whether the operation is successful.
- Run the following command to query the log name of NodeD :
kubectl get pods -A | grep noded
Command output:1mindx-dl noded-c5f52 1/1 Running 0 2m16s
- Query the NodeD log information based on the obtained log name.
kubectl logs noded-c5f52 -n mindx-dl -f
If the log contains "update fault config success", the fault code is dynamically configured.
- Run the following command to query the log name of NodeD :