Viewing Trace Logs

Overview

The trace mechanism means that the maintenance and test information of the software stack is recorded in the memory during program running. When an error occurs during program running or the process ends, the information is flushed to a file. This prevents frequent generation and recording of log files during program running, which affects performance. Currently, only the standard Ascend EP supports this function.

Log Description

The default root directory for storing trace logs is $HOME/ascend/atrace/. You can also use the environment variable ASCEND_WORK_PATH to specify the path for storing trace logs, for example, setting export ASCEND_WORK_PATH=/home/test. For details, see Environment Variables.

The path for storing trace logs is $HOME/ascend/atrace/trace_{Process group pid}_{PID of the process that loads the trace dynamic library for the first time}_{Timestamp of loading the trace dynamic library for the first time}/{event_name}_event_{Current process pid}_{Timestamp of generating the directory}/. event_name indicates the event type and the value can be schedule (service process exception, such as operator execution error), stackcore (process crash or exception signal received), or exit (normal exit from destruction).

Table 1 Description of trace log files

Log Directory

Description

schedule_tracer_ts_{device_id}.txt

When an AI Core error or notify wait timeout occurs, the task schedule returns the maintenance and test information, including the register, hardware buffer, and bitmap, to the host.

stackcore_tracer_{signal}_{tid}_{program_name}_{time}.txt

Lightweight core file recorded when the host service process breaks down, including the stack frame address and base address, needs to be parsed by using the asys tool. For details, see Troubleshooting.

schedule_tracer_{object_name}.txt

Track information reported by modules such as Runtime and HCCL during running, which records the process running process.

schedule_tracer_{object_name}.bin

Track information reported by modules such as AI CPU during running, which records the process running process. The information is stored in binary format. To parse the information, use the asys tool. For details, see Troubleshooting.

You can also use the environment variable ASCEND_LOG_DEVICE_FLUSH_TIMEOUT to configure the delay for sending logs from the device to the host. For details, see Environment Variables.

The preceding directories are shared by all apps in a container or on a physical machine. The number of log files increases with the increase of app processes. As a best practice, you should regularly clear the directories to ensure that the service functions properly. You can use the logrotate function provided by the system to segment logs.