Single-Server Fault Diagnosis
API Prototype
- Clean all logs, process log cleaning results, diagnose fault events, and output analysis reports on a single server.
ascend-fd single-diag -i Collection_directory -o Output_directory_of_the_single-server_diagnosis_result
- Enter a log directory for single-server diagnosis.
ascend-fd single-diag --host_log Collection_directory_of_OS_logs_on_the_host --device_log Collection_directory_of device_logs --train_logCollection_directory_of_user_training_or_inference_logs --process_log Collection_directory_of_CANN_App_logs --env_check Collection_directory_of_NPU_network_port, status_information, and resource_information --dl_log Collection_directory_of_MindCluster_component_logs --mindie_log Collection_directory_of_MindIE_component_logs --amct_log Collection_directory_of_AMCT_logs -o Cleaning_result_output_directory
- If the -i and detailed log collection directory parameters are used in pair, the system preferentially reads the input values of the detailed log collection directory parameters and then reads the remaining log collection directories specified by -i.
- If -i and the eight detailed log collection directory parameters are configured at the same time, -i does not take effect.
- At least one of --input_path, --host_log, --device_log, --train_log, --process_log, --env_check, --dl_log, --mindie_log, and --amct_log must be specified. Otherwise, the cleaning command fails to be executed.
- The drive space of the output directory specified by the cleaning command must be greater than 5 GB. If the drive space is insufficient, some cleaning results may be lost, causing abnormal or inaccurate diagnosis results.
Description
This API starts a single-node diagnosis task. After training or inference fails, original logs such as run logs and NPU environment check files of a single server are diagnosed.
Parameters
Parameter |
Abbreviation |
Required (Yes/No) |
Value Type |
Description |
|---|---|---|---|---|
--host_log |
None |
No |
String |
Collection directory of OS logs on the host. The value can contain only digits, uppercase letters, lowercase letters, tildes (~), hyphens (-), plus signs (+), underscores (_), periods (.), slashes (/), and spaces. |
--device_log |
None |
No |
String |
Collection directory of device logs. The value can contain only digits, uppercase letters, lowercase letters, tildes (~), hyphens (-), plus signs (+), underscores (_), periods (.), slashes (/), and spaces. |
--train_log |
None |
No |
String |
Collection directory of user training or inference logs.
|
--process_log |
None |
No |
String |
Collection directory of CANN App logs. The value can contain only digits, uppercase letters, lowercase letters, tildes (~), hyphens (-), plus signs (+), underscores (_), periods (.), slashes (/), and spaces. |
--env_check |
None |
No |
String |
Collection directory of NPU network ports, status information, and resource information. The value can contain only digits, uppercase letters, lowercase letters, tildes (~), hyphens (-), plus signs (+), underscores (_), periods (.), slashes (/), and spaces. |
--dl_log |
None |
No |
String |
Collection directory of Ascend Device Plugin, NodeD, Ascend Docker Runtime, NPU Exporter, and Volcano logs. The value can contain only digits, uppercase letters, lowercase letters, tildes (~), hyphens (-), plus signs (+), underscores (_), periods (.), slashes (/), and spaces. |
--mindie_log |
None |
No |
String |
Collection directory of logs generated by MindIE Server, MindIE LLM, MindIE SD, MindIE RT, MindIE Torch, MindIE MS, MindIE Benchmark, and MindIE Client. The value can contain only digits, uppercase letters, lowercase letters, tildes (~), hyphens (-), plus signs (+), underscores (_), periods (.), slashes (/), and spaces. |
--amct_log |
None |
No |
String |
AMCT logs The value can contain only digits, uppercase letters, lowercase letters, tildes (~), hyphens (-), plus signs (+), underscores (_), periods (.), slashes (/), and spaces. |
--input_path |
-i |
No |
String |
Path for storing preprocessed data. The value can contain only digits, uppercase letters, lowercase letters, tildes (~), hyphens (-), plus signs (+), underscores (_), periods (.), slashes (/), and spaces. |
--output_path |
-o |
Yes |
String |
Output path of cleaned data. The value can contain only digits, uppercase letters, lowercase letters, tildes (~), hyphens (-), plus signs (+), underscores (_), periods (.), slashes (/), and spaces. |
--help |
-h |
No |
- |
Displays the meanings and usage instructions of level-2 commands and parameters. |
Return Value
The single-diag job starts. Please wait. Job id: [****], run log file is [****]. Diagnosis content The single-diag job is complete.