Functions and Restrictions of the asys Tool

Functions

To improve the efficiency of system fault maintenance and test, the asys fault collection tool is provided for one-click collection of fault information. It supports the following functions:

  • Service re-run + fault information collection: Re-run services and collect fault information at a time, improving the efficiency of fault reproduction and information collection.
  • Fault information collection: Collect onsite process fault information to provide effective input for improving fault locating efficiency.
  • Display of software, hardware, and device status information: Collect the installation package version information, device temperature, and power.
  • Health check: Check the health status of all devices or specified devices. If a device is unhealthy, an error message is displayed.
  • Comprehensive detection: Involve the stress test, HBM hardware detection, and CPU detection.
  • Trace/Core dump/Stackcore file parsing
  • Environment configuration: Obtain or restore the specified configuration.
Table 1 Information that can be collected by the asys tool

Category

Description

Software information

Software package version, environment variables, software dependency, and system information.

Log information

The information includes:

  • CANN software stack logs on the host side.
  • Message logs on the host.
  • Device firmware logs: device-* logs (requiring the root permission)
  • Device system logs: message logs and device-os logs (requiring the root permission)
  • Black box and Stackcore files (requiring the root permission)
  • Task display logs.
  • Runfile execution logs (available only when the runfile installation user is the same as the program execution user)

Dump information

The information includes:

  • Dumped GE graphs.
  • Dumped TF Adapter graphs.
  • Dump file generated when an AI Core error occurs.

.o and .json files for operator compilation

-

Operator compilation process file

Only the operator compilation process information is collected during service re-run. The information includes compilation success or failure, reused memory, online compilation, and binary compilation results.

Whether the asys tool can collect the operator compilation process information depends on whether the NPU_COLLECT_PATH environment variable (used to set the path for saving fault information) is specified. If it is set, the system creates the /extra-info/ops/ subdirectory in the directory specified by the environment variable, creates op_compile_stats.log in the subdirectory, and writes the operator compilation process information to the log file. In this case, the asys tool can collect the operator compilation process information. If this environment variable is not set, the system does not generate the corresponding log file. Therefore, the asys tool does not collect the file.

Custom operator configuration information (*.json file)

Whether the asys tool can collect the custom operator configuration information depends on whether the following environment variables are set:

  • If the ASCEND_OPP_PATH environment variable (used to set the installation path of the operator library) is set, the asys tool collects the custom operator configuration information (that is, the config/*.json file) in the ${ASCEND_OPP_PATH}/vendors directory based on the load_priority field in the ${ASCEND_OPP_PATH}/vendors/config.ini file. Otherwise, the asys tool does not collect the information.
  • If the ASCEND_CUSTOM_OPP_PATH environment variable (used to set the installation path of the custom operator package) is set, the custom operator configuration information (that is, the config/*.json file) in the ${ASCEND_CUSTOM_OPP_PATH} directory is collected. Otherwise, the asys tool does not collect the information.

Commands executed in user cases

-

Binary information of the debugging version

Information in the ${ASCEND_OPP_PATH}/debug_kernel directory. You need to configure the ASCEND_OPP_PATH environment variable (used to set the installation directory of the operator library) in advance. If the ASCEND_OPP_PATH environment variable is not configured or incorrectly configured, the binary information of the debugging version is not collected by default.

For details about how to set environment variables, see Environment Variables.

Restrictions

  1. If more than one process is operated by the same user on a machine at the same time, the collected data may overlap.
  2. Only limited data can be collected by a non-root user. For details about the limitations, see the privilege requirements in Functions.
  3. The one-click tool cannot be used to collect fault information in cluster, container, VM, and cloud scenarios.
  4. The asys tool collects a large amount of maintenance and test information. Therefore, memory usage is involved. You are not advised to run multiple processes in parallel. Otherwise, an error may occur during the execution of the asys tool or the environment may encounter exceptions.
  5. This tool does not support the RC mode.