ASCEND_WORK_PATH

Description

If you want the files generated during component build and runtime to be flushed to a unified directory, you can set this environment variable to specify the storage path of files shared exclusively on a server.

  • The path name can contain letters, digits, underscores (_), hyphens (-), and periods (.).
  • If the specified path exists and is valid, ensure that the execution user has the read, write, and execute permissions on the path. If the specified path does not exist, the software automatically creates the path.

The following table lists the files in ${ASCEND_WORK_PATH}.

Table 1 Paths of flushed single-server exclusive files

Flushed File

File Content

Priority Specification

${ASCEND_WORK_PATH}/aoe_data/

Information related to AOE tuning, including the tuning task name, tuning duration, model execution time/operator execution time before and after tuning, and knowledge base hit information.

ASCEND_WORK_PATH > Default AOE tuning working directory.

The default AOE tuning working directory is as follows:

Value of WORK_PATH in ${install_path}/latest/tools/aoe/conf/aoe.ini.

${ASCEND_WORK_PATH}/log/

Log file.

ASCEND_PROCESS_LOG_PATH > ASCEND_WORK_PATH > Default path for storing logs ($HOME/ascend/log)

${ASCEND_WORK_PATH}/atrace/

Trace log file.

ASCEND_WORK_PATH > Default path for storing trace logs ($HOME/ascend/atrace)

${ASCEND_WORK_PATH}/profiling_data/

Profile data collected by the Profiling tool.

Offline inference scenario:

  • If you run the msprof command to collect profile data, the priority is as follows:

    Parameter --output > ASCEND_WORK_PATH > Default path (path for storing the inference model file)

  • If the acl.json configuration file is used to collect profile data, the priority is as follows:

    Parameter output > ASCEND_WORK_PATH > Default path (path for the executable file of the application project)

TensorFlow training/online inference scenario:

Parameter output in the environment variable PROFILING_OPTIONS, parameter output in profiling_options of the training script, and parameter output_path configured when the training script calls the Profiler class have a higher priority than ASCEND_WORK_PATH.

In the training or online inference scenario, either output/output_path or ASCEND_WORK_PATH must be configured. Otherwise, an error will be reported when profile data is collected.

PyTorch training/online inference scenario:

Profile data path specified by the on_trace_ready=tensorboard_trace_handler function > ASCEND_WORK_PATH > Default path

If no path is specified after the tensorboard_trace_handler function is configured, you can use the environment variable ASCEND_WORK_PATH to set the path. In this case, the profile data flushed to disks is automatically parsed. If the on_trace_ready=torch_npu.profiler.tensorboard_trace_handler function is not used in the code, the profile data that is set and flushed to disks using the environment variable ASCEND_WORK_PATH is the original data.

${ASCEND_WORK_PATH}/kernel_meta/

Debugging-related process files generated during operator build, including the operator binary files (.o), operator description files (.json), and CCE files.

The following scenarios provide parameters or APIs for setting the storage path of the debugging process files generated during operator compilation. Their priority is higher than that of the ASCEND_WORK_PATH environment variable. The details are as follows:
  • Offline model compilation using ATC:

    Parameter --debug_dir > ASCEND_WORK_PATH > Default path (./Current execution path)

  • Model build or compilation using AscendCL APIs:
    • Parameter DEBUG_DIR in the graph construction API aclgrphBuildInitialize > ASCEND_WORK_PATH > Default path (./Current execution path)
    • Parameter DEBUG_DIR in the graph construction API aclgrphBuildModel > ASCEND_WORK_PATH > Default path (./Current execution path)
    • Parameter ACL_DEBUG_DIR in the application build API aclCompileOpt > ASCEND_WORK_PATH > Default path (./Current execution path)
  • TensorFlow network training or online inference:

    TF Adapter configuration parameter debug_dir > ASCEND_WORK_PATH > Default path (./Current execution path)

    For details about the TF Adapter configuration parameter debug_dir, see:

${ASCEND_WORK_PATH}/${pid}_${device_id}/

Dump graph files flushed to disks after the function of printing dump graph description is enabled (that is, the environment variable DUMP_GE_GRAPH is configured).

DUMP_GRAPH_PATH > ASCEND_WORK_PATH > Default path (./Current execution path)

${ASCEND_WORK_PATH}/extra-info/data-dump/

Exception dump file and exception operator compilation information

Environment variable NPU_COLLECT_PATH > ASCEND_WORK_PATH > Default path (Current execution path)

${ASCEND_WORK_PATH}/tmp_weight_${pid}_${session_id}

When external weights are enabled, the weight files of the Const and Constant nodes are stored in this directory.

The method of enabling external weights varies according to the scenario. For details, see the user guide or development guide of the corresponding scenario.

ASCEND_WORK_PATH > Default path (./Current execution path)

${ASCEND_WORK_PATH}/FE/${pid}/fusion_result.json

Fusion patterns except for those disabled in the fusion_switch.cfg file.

ASCEND_WORK_PATH > Default path (./Current execution path)

${ASCEND_WORK_PATH}/check_result.json

Model precheck result file.

When ATC is used to build an offline model, if --mode in the ATC command is set to 0 and the model fails to be parsed, or --mode is set to 3 for model precheck only, you can use this environment variable to specify the path for storing the precheck result file.

Parameter --check_report > ASCEND_WORK_PATH > Default path (./Current execution path)

Example

export ASCEND_WORK_PATH=/repo/task001/172.16.1.12_01_03

The path must exist, and the execution user must have the read, write, and execute permissions on the path. The last field of the path must uniquely identify the current host.

It is recommended that the unique ID consist of machine ID, VM ID, and Docker ID, for example, machineID_vmID_dockerID.
  • machineID: IP address of the current host.
  • vmID: VM ID.
  • dockerID: Docker container ID.

If you use a physical machine, you only need to use the IP address to identify the physical machine.

Restrictions

In the multi-server scenario, the AI processor model, firmware, driver, and CANN software version on each server must be the same.

Applicability

Atlas training products

Atlas inference products

Atlas A2 training products / Atlas A2 inference products

Atlas A3 training products / Atlas A3 inference products

Atlas 200I/500 A2 inference products