Cleaning and Diagnosing the Root Cause Node
Procedure
- Import the root cause node cleaning and diagnosis APIs from MindCluster Ascend FaultDiag.
from ascend_fd import parse_root_cluster from ascend_fd import diag_root_cluster
- Clean the root cause node.
# Root cause node cleaning result and errors that occur during the cleaning. rc_parse_results, rc_parse_err_msg = parse_root_cluster(input_log_list)
- Diagnose the cleaned root cause node.
# Root cause node diagnosis result and errors that occur during the diagnosis. results, err_msg_list = diag_root_cluster(rc_parse_results)
The input_log_list input format is as follows, which is for reference only. You need to modify the input information for root cause node cleaning as required.
[
{
"log_domain": {
"server": "10.1.1.1",
"instance_id": "instance_name"
},
"log_items": [
{
"item_type": "plog",
"pid": 3199,
"device_id": 0,
"rank_id": 0,
"log_lines": [
'[ERROR] xxx.'
]
}
]
}
]
Field |
Parameter Type |
Required (Yes/No) |
Description |
|---|---|---|---|
log_domain |
Dictionary |
Yes |
Log domain |
server |
String |
Yes |
Server IP address |
instance_id |
String |
Yes |
Instance name |
log_items |
List |
Yes |
Log item |
item_type |
String |
Yes |
Log type |
pid |
Integer |
Yes |
Process ID |
device_id |
Integer |
No |
Device ID |
rank_id |
Integer |
No |
Communicator's rank ID |
log_lines |
List |
Yes |
Log line to be parsed |
Field |
Parameter Type |
Description |
|---|---|---|
Error message |
List |
Error message generated during interface execution |
The following is an example of the results output format:
{
'analyze_success': True,
'fault_description': {
'code': 102,
'string': 'The Plog of all valid nodes does not contain error log information. The root cause node cannot be located. Check whether the task is normal.'
},
'root_cause_device': ['ALL Device'],
'device_link': [],
'remote_link': '',
'first_error_device': '',
'last_error_device': ''
}
Field |
Parameter Type |
Description |
|---|---|---|
analyze_success |
Bool |
Whether the diagnosis is successful.
|
fault_description |
Dictionary |
Fault description |
code |
Integer |
Fault code |
string |
String |
Fault code description |
root_cause_device |
List |
Root cause device information |
device_link |
List |
Root cause node chain |
remote_link |
String |
Inter-device waiting chain |
first_error_device |
String |
Device where the first fault occurs |
last_error_device |
String |
Device where the latest fault occurs |