Environment Variable List

This document describes the environment variables that can be used when developers build AI applications and services based on CANN.

  • Environment variables can be implemented using commands, APIs, and configurations, including the export command, putenv/getenv/setenv/unsetenv/clearenv functions, os.environ, and os.getenv. It is recommended that the user should set environment variables before starting application processes. Otherwise, environment variable access conflicts may occur, causing program exceptions.
  • This document does not describe the environment variables of Ascend Extension for PyTorch. For details about the environment variables of Ascend Extension for PyTorch, see Ascend Extension for PyTorch Environment Variable Reference .

The following table lists the environment variables.

Table 1 Environment variable list

Environment Variable

Description

Installation Packages

This section describes the basic environment variables related to program build and execution that are configured by default using the set_env.sh script after the CANN software is installed, and the environment variables related to the installation packages that need to be manually configured during subsequent program build and execution.

Files Flushed to Disks

ASCEND_CACHE_PATH

If you want the build and run files to be flushed to a unified directory, you can set this environment variable to set the storage path of shared files.

ASCEND_WORK_PATH

If you want the build and run files to be flushed to a unified directory, you can set this environment variable to set the storage path of files shared exclusively on a server.

Graph Build

DUMP_GE_GRAPH

DUMP_GE_GRAPH sets the graph dump mode.

DUMP_GRAPH_LEVEL

DUMP_GRAPH_LEVEL sets the graph to dump.

DUMP_GRAPH_PATH

Sets the path for storing dump graph files. The path can be an absolute path or a relative path of the script execution path.

OP_NO_REUSE_MEM

By default, memory reuse is enabled during graph build on the Ascend platform. In fault locating scenarios, if developers suspect that the computation result is abnormal due to memory reuse errors, they can use this environment variable to allocate memory to an operator separately.

ASCEND_ENGINE_PATH

Set this environment variable if you want to convert a single-operator .json file into an offline model and only use TBE operators during the conversion process (errors are reported if TBE operators cannot be found, without searching for AI CPU operators).

MAX_COMPILE_CORE_NUMBER

Specifies the number of CPU cores available for graph build.

MULTI_THREAD_COMPILE

Enables or disables the single-thread build during model conversion.

ENABLE_NETWORK_ANALYSIS_DEBUG

In the TensorFlow training scenario, if computational graph build fails, the training process is terminated by default and the remaining graphs are not delivered to the device. You can set this environment variable to make TF Adapter continuously deliver computational graphs to the device without terminating the training process when graph build fails.

Operator Build

TE_PARALLEL_COMPILER

Parallel build is especially useful when a large network is used. You can set this environment variable to enable parallel build.

ASCEND_MAX_OP_CACHE_SIZE

When the operator build cache function is enabled, you can set this environment variable to limit the disk space of the cache folder in the Ascend AI Processor.

ASCEND_REMAIN_CACHE_SIZE_RATIO

Specifies how much (in percentage) of the build cache space is retained when the build cache space of a specified processor reaches ASCEND_MAX_OP_CACHE_SIZE in the scenario where the operator build cache function is enabled. Defaults to 50 (%).

IGNORE_INFER_ERROR

Specifies whether to skip operator prototype deliverable verification when an operator is inserted to a graph. The deliverables include the implementation of the adaptation functions for operator insertion into the graph, such as shape deduction.

Resource Configuration

ASCEND_DEVICE_ID

Specifies the logical ID of the Ascend AI Processor used by the current process.

ASCEND_RT_VISIBLE_DEVICES

Specifies the devices that are visible to the current process. One or more device IDs can be specified at a time. By using this environment variable, you can adjust the devices without modifying the application.

Operator Execution

ACLNN_CACHE_LIMIT

Sets the number of operator information entries cached on the host for a single-operator API. The cached operator information includes the workspace size, operator executor, and tiling information.

Graph Execution

ENABLE_DYNAMIC_SHAPE_MULTI_STREAM

During graph execution, enabling the multi-stream concurrency function can improve network performance in certain scenarios. Currently, the multi-stream concurrency function is disabled by default. If you want to enable this function in dynamic shape scenario, you can use this environment variable to enable it.

MAX_RUNTIME_CORE_NUMBER

In training and online inference scenarios, this environment variable can be used to enable multi-thread task scheduling of the graph executor (host) for the network in dynamic shape graph mode.

TF Adapter

JOB_ID

In the TensorFlow training and online inference scenarios, this environment variable is used to specify the task ID, which is user-defined.

ENABLE_FORCE_V2_CONTROL

In the TensorFlow 1.15 training scenario, if the input has a dynamic shape, upgrade the control flow operators of the V1 version to those of the V2 version to support the dynamic shape function. Only TensorFlow V2 control flow operators (such as If, Case, While, For, and PartitionedCall) support dynamic shapes. TensorFlow V1 control flow operators (such as Switch, Merge, Enter, LoopCond, NextIteration, Exit, and ControlTrigger) corresponding to the tf.case, tf.cond, and tf.while_loop APIs do not support dynamic shapes. If the network has many branch structures, upgrade the control flow operators of the V1 version to those of the V2 version. Otherwise, the flow of data may exceed the limit.

NPU_DEBUG

Enables or disables debug logging of the TF Adapter in the TensorFlow 2.6.5 training and online inference scenario.

NPU_DUMP_GRAPH

Enables or disables graph dump of the TF Adapter in the TensorFlow 2.6.5 training and online inference scenario.

NPU_ENABLE_PERF

Prints graph time consumption of the TF Adapter in the TensorFlow 2.6.5 training and online inference scenarios.

NPU_LOOP_SIZE

Sets the number of iterations per loop offloaded to the NPU in the TensorFlow 2.6.5 training and online inference scenarios.

STEP_NOW

In the TensorFlow 1.15 training scenario, if the training acceleration function is enabled through the experimental_accelerate_train_mode or accelerate_train_mode parameter, you can use this environment variable to set the number of execution steps on the NPU.

TOTAL_STEP

In the TensorFlow 1.15 training scenario, if the training acceleration function is enabled through the experimental_accelerate_train_mode or accelerate_train_mode parameter, you can use this environment variable to set the total number of training steps on the NPU.

LOSS_NOW

In the TensorFlow 1.15 training scenario, if the training acceleration function is enabled through the experimental_accelerate_train_mode or accelerate_train_mode parameter, you can use this environment variable to set the loss value of the current iteration on the NPU.

TARGET_LOSS

In the TensorFlow 1.15 training scenario, if the training acceleration function is enabled through the experimental_accelerate_train_mode or accelerate_train_mode parameter, you can use this environment variable to set the target training loss value on the NPU.

RANK_TABLE_FILE

In the TensorFlow distributed training or inference scenario, this environment variable is used to specify the path and name of the ranktable resource configuration file of the Ascend AI Processor that participates in collective communication.

RANK_ID

Sets the rank ID of the current process in the collective communication process group in the TensorFlow distributed training or inference scenario.

RANK_SIZE

Sets the cluster rank size corresponding to the current training process, that is, the number of devices in the cluster in the TensorFlow distributed training or inference scenario.

CM_CHIEF_IP

In the TensorFlow distributed training scenario, you can choose not to use the ranktable file. Instead, you can use the environment variables to automatically generate resource information and initialize the collective communication component.

This environment variable is used to configure the listening host IP address of the master node.

CM_CHIEF_PORT

Configures the listening port of the master node.

CM_CHIEF_DEVICE

Configures the logical ID of the device for collecting statistics on the server cluster on the master node.

CM_WORKER_SIZE

Configures the number of devices in the service communicator.

CM_WORKER_IP

Configures the NIC IP address used for information exchange between the current device and the master node.

Collective Communication

HCCL_IF_IP

Configures the IP address of the initial root NIC of HCCL.

HCCL_IF_BASE_PORT

Specifies the start port number of the host NIC in single-operator mode when the host NIC is used for HCCL initialization or collective communication. When the variable is configured, the system occupies 16 ports starting from the specified port by default.

HCCL_SOCKET_IFNAME

Sets the name of the initial host root NIC of HCCL. The HCCL can obtain the host IP address based on this name to create a communicator.

HCCL_SOCKET_FAMILY

Sets the IP protocol used by the communication NIC.

HCCL_CONNECT_TIMEOUT

Configures the timeout wait period of socket connection establishment between different devices in the distributed training or inference scenario.

HCCL_EXEC_TIMEOUT

During distributed training or inference, tasks executed by different device processes may be inconsistent, for example, only specific processes save the checkpoint data. This environment variable controls the synchronization wait time during task execution between devices. Within this configured time, each device process waits for other devices to perform communication synchronization.

HCCL_ALGO

Configures the communication algorithm between servers. The algorithm can be configured globally or by operator.

HCCL_BUFFSIZE

Sets the size of the buffer for sharing data between two NPUs. The unit is MB. The value must be an integer greater than or equal to 1. The default value is 200.

HCCL_INTRA_PCIE_ENABLE

Specifies whether to use the PCIe path for multi-processor communication on a server.

HCCL_INTRA_ROCE_ENABLE

Specifies whether to use the RoCE path for multi-processor communication on a server.

HCCL_WHITELIST_DISABLE

Enables or disables the HCCL communication trustlist.

HCCL_WHITELIST_FILE

Configures the path of the HCCL communication whitelist configuration file after the communication whitelist verification function is enabled using HCCL_WHITELIST_DISABLE. Only IP addresses in the communication whitelist can be used for collective communication.

HCCL_ENTRY_LOG_ENABLE

Controls whether to print the runtime logs of the communication operator in real time.

HCCL_RDMA_TC

Sets the traffic class (TC) of the RDMA NIC.

HCCL_RDMA_SL

Sets the service level (SL) of the RDMA NIC. The value must be the same as the PFC priority set for the NIC. Otherwise, performance may deteriorate.

HCCL_RDMA_TIMEOUT

The minimum retransmission timeout of the RDMA NIC is calculated as follows: 4.096 μs x 2 ^ timeout. In the formula, timeout is the value of this environment variable, and the actual retransmission timeout is related to the user network status.

HCCL_RDMA_RETRY_CNT

Configures the number of retransmission times of the RDMA NIC. The value must be an integer ranging from 1 to 7. The default value is 7.

AOE Tuning

TUNE_BANK_PATH

Sets the path of the custom repository generated after Auto Tune.

REPEAT_TUNE

Initiates tuning again. This environment variable takes effect only when subgraph tuning or operator tuning is enabled.

AOE_MODE

Sets the AOE optimization mode in online inference and training scenarios.

AMCT Model Compression

AMCT_LOG_FILE_LEVEL

Sets the level of messages in the amct_pytorch.log file for the PyTorch framework, amct_caffe.log for the Caffe framework, and amct_onnx.log for the ONNX framework, and sets the level of messages in the log file generated of the corresponding quantizable layer when the fake-quantized model is generated.

AMCT_LOG_LEVEL

Sets the level of information displayed on the screen. This environment variable applies only to quantization of PyTorch, Caffe, and ONNX models.

DUMP_AMCT_RECORD

Generates or not quantization factors for weights and data. This environment variable applies only to model compression of the MindSpore framework.

AMCT_LOG_DUMP

Controls information such as log flushing during post-training quantization. This environment variable applies only to quantization performed by calling aclgrphCalibration, an AscendCL API.

Profiling

PROFILING_MODE

Enables or disables the profiling function.

PROFILING_OPTIONS

Sets profiling configuration options in training or online inference scenarios.

PROF_CONFIG_PATH

Specifies the path of the profiler_config.json configuration file for the dynamic_profile collection function of the Ascend PyTorch Profiler API in the PyTorch training scenario.

Log

ASCEND_PROCESS_LOG_PATH

Sets the log flush path.

ASCEND_SLOG_PRINT_TO_STDOUT

Enables or disables log printing. After this function is enabled, logs are not saved in the log file. Instead, the generated logs are printed on the screen.

ASCEND_GLOBAL_LOG_LEVEL

Sets the level of application logs and module logs. Only debug logs are supported.

ASCEND_MODULE_LOG_LEVEL

Sets the level of each module of app logs. Only debug logs are supported.

ASCEND_GLOBAL_EVENT_ENABLE

Enables or disables event logging for applications.

ASCEND_LOG_DEVICE_FLUSH_TIMEOUT

Specifies the timeout for flushing app logs from the device to the host.

ASCEND_HOST_LOG_FILE_NUM

Sets the number of log files of each process stored in the application log directories (plog and device-id) in the Ascend EP scenario.

ASCEND_COREDUMP_SIGNAL

Sets the core dump semaphore for trace processing.

ASCEND_LOG_SYNC_SAVE

Specifies the processing mode for log congestion.

Fault Information Collection

NPU_COLLECT_PATH

Sets a path for storing fault information, including dump graphs, abnormal data of AI Core operators, and operator compilation information. The path can be an absolute path or a relative path (the path relative to the location of the executable program or command), on which users must have the read, write, and execute permissions. If the path does not exist, the system automatically creates the directory in the path.

Environment Variables That Will Be Deprecated in Later Versions

GE_USE_STATIC_MEMORY

Configures the memory allocation mode used during network running.

ENABLE_ACLNN

During graph build, this environment variable is set to determine whether to call the host execution function registered by the operator to implement the host execution logic and kernel delivery during graph execution.