Cleaning and Diagnosing the Root Cause Node

Procedure

  1. Import the root cause node cleaning and diagnosis APIs from MindCluster Ascend FaultDiag.
    from ascend_fd import parse_root_cluster
    from ascend_fd import diag_root_cluster
  2. Clean the root cause node.
    # Root cause node cleaning result and errors that occur during the cleaning.
    rc_parse_results, rc_parse_err_msg = parse_root_cluster(input_log_list)
  3. Diagnose the cleaned root cause node.
    # Root cause node diagnosis result and errors that occur during the diagnosis.
    results, err_msg_list = diag_root_cluster(rc_parse_results)

The input_log_list input format is as follows, which is for reference only. You need to modify the input information for root cause node cleaning as required.

[
  {
    "log_domain": {
      "server": "10.1.1.1",
      "instance_id": "instance_name"
    },
    "log_items": [
      {
        "item_type": "plog",
        "pid": 3199,
        "device_id": 0,
        "rank_id": 0,
        "log_lines": [
            '[ERROR] xxx.'
        ]
      }
    ]
  }
]
Table 1 input_log_list parameters

Field

Parameter Type

Required (Yes/No)

Description

log_domain

Dictionary

Yes

Log domain

server

String

Yes

Server IP address

instance_id

String

Yes

Instance name

log_items

List

Yes

Log item

item_type

String

Yes

Log type

pid

Integer

Yes

Process ID

device_id

Integer

No

Device ID

rank_id

Integer

No

Communicator's rank ID

log_lines

List

Yes

Log line to be parsed

Table 2 err_msg_list parameters

Field

Parameter Type

Description

Error message

List

Error message generated during interface execution

The following is an example of the results output format:

{
    'analyze_success': True,
    'fault_description': {
        'code': 102,
        'string': 'The Plog of all valid nodes does not contain error log information. The root cause node cannot be located. Check whether the task is normal.'
    },
    'root_cause_device': ['ALL Device'],
    'device_link': [],
    'remote_link': '',
    'first_error_device': '',
    'last_error_device': ''
}
Table 3 results parameters

Field

Parameter Type

Description

analyze_success

Bool

Whether the diagnosis is successful.

  • True: diagnosis succeeded
  • False: diagnosis failed

fault_description

Dictionary

Fault description

code

Integer

Fault code

string

String

Fault code description

root_cause_device

List

Root cause device information

device_link

List

Root cause node chain

remote_link

String

Inter-device waiting chain

first_error_device

String

Device where the first fault occurs

last_error_device

String

Device where the latest fault occurs