Collect Information About Process Interruption

The information can be collected both manually and automatically.
  • Manual collection: Host and device log files are collected. Only the minimum set of information is collected.
  • Automatic collection: On the host service, use the asys tool to collect all fault-related information, including the installation version information, device health status information, dump files, operator compilation information, and full log files.

    Note: The application scenarios of the asys tool are limited. The asys tool cannot be used to collect fault information in one-click mode in cluster, container, virtual machines, and cloud scenarios.

Manual Collection Method

To collect application logs on the host and system logs on the device, perform the following steps:
  1. Plan a directory for storing log files on the host server, for example, ${HOME}/err_log_info/.
  2. The default path of application log files on the host is ${HOME}/ascend/log on the host server. Copy the log files to the err_log_info directory:
    mv ${HOME}/ascend/log ${HOME}/err_log_info/ 
  3. System logs (including slog logs, syslog logs, black box and etc.) on the device are exported to the host using the msnpureport tool.
    #Directory for storing Device logs in ${HOME}/err_log_info
    cd ${HOME}/err_log_info
    mkdir report
    
    # Run the msnpureport command in the report directory
    cd resport
    Driver_installation_directory/driver/tools/msnpureport -f

For details about log levels, log paths, and log files, see Log Reference.

In addition, when locating problems, technical support may need onsite service information and user operation logs. Onsite service information indicates whether the onsite service is a single operator, model inference, or model training service. If the onsite service is a training service, the onsite service information contains scale of the training cluster and etc. User operation logs record user operations on the host server. Based on this part, technical support can learn about the basic onsite information and check whether the process is manually interrupted.

Automatic Collection

For details about the restrictions on using the asys tool, see Functions and Restrictions of the asys Tool. Before using the asys tool, install and configure it. For details, see the prerequisites in asys Tool Usage Guide (EP Mode).

The following is an example of the asys tool command. Run the asys collect command to collect fault information.

asys collect [--output="path"]

output indicates the directory for saving collected information. For details about the parameters and restrictions, see Fault Information Collection.