(Optional) Configuring the Public Fault Level and Sender

The fault level configuration file publicFaultConfiguration.json is embedded in the ClusterD image. When ClusterD is started, the default configurations of the file are used for fault handling.

If you want to customize the fault level, create the /user1/mindx-dl/clusterd/publicCustomization.json directory on the host.

  • If the file already exists when ClusterD is started, ClusterD uses its configuration as the basis for current fault handling.
  • If the file exists after ClusterD is reinstalled, the default publicFaultConfiguration.json file of ClusterD does not take effect, and the existing publicCustomization.json is used. If you want to use the default configuration of publicFaultConfiguration.json, delete the existing publicCustomization.json so that ClusterD can read the default publicFaultConfiguration.json file.
  • If the format of the publicCustomization.json file is incorrect, Ascend Device Plugin reads the content of the publicFaultConfiguration.json file built in the image by default for fault handling.

Configuring the Fault Level of a Public Fault Code

You can configure the fault level of a public fault code in either of the following scenarios:

  • Modify the fault level of an existing fault code.
  • Add a fault code and its fault level.

    The following uses fault code 010001008 as an example to describe how to configure the fault level of a public fault code.

  1. Log in to the environment and go to the /user1/mindx-dl/clusterd directory.
  2. Run the vi publicCustomization.json command to edit the file. For details about publicCustomization.json, see Table 2.
    • After creating the publicCustomization.json file, ensure that the hwMindX user of ClusterD has the read permission on the file. For example, if the user is root, you are advised to set the file permission to 644.
    • You need to ensure the file permission security. If the permission is too high, security risks may exist.
    {
      "publicFaultCode": {
        "NotHandleFaultCodes":[],
        "SubHealthFaultCodes":[],
        "SeparateNPUCodes":["010001008"],
        "PreSeparateNPUCodes":[]
      },
      "publicFaultResource": [
        "CCAE", "fd-online", "pingmesh", "Netmind", "dpcStorage"
      ]
    }
  3. After the modification, press Esc, enter :wq!, save the configuration, and exit.
  4. The file takes effect after a few seconds. Then, check whether the operation is successful.

    If "load fault config from <publicCustomization.json> success" is displayed in the log, the fault code is manually configured.

Configuring the Sender of Public Faults

The following uses the new fault sender XXX as an example to describe how to configure the sender of public fault codes.

  1. Log in to the environment and go to the /user1/mindx-dl/clusterd directory.
  2. Run the vi publicCustomization.json command to edit the file. For details about publicCustomization.json, see Table 2.
    {
      "publicFaultCode": {
        "NotHandleFaultCodes":[],
        "SubHealthFaultCodes":[],
        "SeparateNPUCodes":[],
        "PreSeparateNPUCodes":[]
      },
      "publicFaultResource": [
        "CCAE", "fd-online", "pingmesh", "Netmind", "dpcStorage", "XXX"
      ]
    }
  3. After the modification, press Esc, enter :wq!, save the configuration, and exit.
  4. The file takes effect after a few seconds. Then, check whether the operation is successful.

    If "load fault config from <publicCustomization.json> success" is displayed in the log, the fault code is manually configured.