Functions and Restrictions of the asys Tool
Functions
To improve the efficiency of system fault maintenance and test, the asys fault collection tool is provided for one-click collection of fault information. It supports the following functions:
- Service re-run + fault information collection: Re-run services and collect fault information at a time, improving the efficiency of fault reproduction and information collection.
- Fault information collection: Collect onsite process fault information to provide effective input for improving fault locating efficiency.
- Display of software, hardware, and device status information: Collect the installation package version information, device temperature, and power.
- Health check: Check the health status of all devices or specified devices. If a device is unhealthy, an error message is displayed.
- Comprehensive detection: Involve the stress test, HBM hardware detection, and CPU detection.
- Trace/Core dump/Stackcore file parsing
- Environment configuration: Obtain or restore the specified configuration.
Category |
Description |
|---|---|
Software information |
Software package version, environment variables, software dependency, and system information. |
Log information |
The information includes:
|
Dump information |
The information includes:
|
.o and .json files for operator compilation |
- |
Operator compilation process file |
Only the operator compilation process information is collected during service re-run. The information includes compilation success or failure, reused memory, online compilation, and binary compilation results. Whether the asys tool can collect the operator compilation process information depends on whether the NPU_COLLECT_PATH environment variable (used to set the path for saving fault information) is specified. If it is set, the system creates the /extra-info/ops/ subdirectory in the directory specified by the environment variable, creates op_compile_stats.log in the subdirectory, and writes the operator compilation process information to the log file. In this case, the asys tool can collect the operator compilation process information. If this environment variable is not set, the system does not generate the corresponding log file. Therefore, the asys tool does not collect the file. |
Custom operator configuration information (*.json file) |
Whether the asys tool can collect the custom operator configuration information depends on whether the following environment variables are set:
|
Commands executed in user cases |
- |
Binary information of the debugging version |
Information in the ${ASCEND_OPP_PATH}/debug_kernel directory. You need to configure the ASCEND_OPP_PATH environment variable (used to set the installation directory of the operator library) in advance. If the ASCEND_OPP_PATH environment variable is not configured or incorrectly configured, the binary information of the debugging version is not collected by default. |
Restrictions
- If more than one process is operated by the same user on a machine at the same time, the collected data may overlap.
- Only limited data can be collected by a non-root user. For details about the limitations, see the privilege requirements in Functions.
- The one-click tool cannot be used to collect fault information in cluster, container, VM, and cloud scenarios.
- The asys tool collects a large amount of maintenance and test information. Therefore, memory usage is involved. You are not advised to run multiple processes in parallel. Otherwise, an error may occur during the execution of the asys tool or the environment may encounter exceptions.
- This tool does not support the RC mode.
