Environment Variable List
This document describes the environment variables that can be used when developers build AI applications and services based on CANN.
- Environment variables can be implemented using commands, APIs, and configurations, including the export command, putenv/getenv/setenv/unsetenv/clearenv functions, os.environ, and os.getenv. It is recommended that the user should set environment variables before starting application processes. Otherwise, environment variable access conflicts may occur, causing program exceptions.
- This document does not describe the environment variables of Ascend Extension for PyTorch. For details about the environment variables of Ascend Extension for PyTorch, see Ascend Extension for PyTorch Environment Variable Reference.
Installation
|
Environment Variable |
Description |
|---|---|
|
If you want the files generated during component build and runtime to be flushed to a unified directory, you can set this environment variable to specify the storage path of shared files. |
|
|
If you want the files generated during component build and runtime to be flushed to a unified directory, you can set this environment variable to specify the storage path of files shared exclusively on a server. |
|
|
Installation path of the custom operator package. If the custom operator package generated during build needs to be installed in a specified path, you need to set this environment variable to specify the path. |
Graph Building
|
Environment Variable |
Description |
|---|---|
|
DUMP_GE_GRAPH sets the graph dump mode. |
|
|
DUMP_GRAPH_LEVEL sets the graph to dump. |
|
|
DUMP_GRAPH_FORMAT controls the type of the dump file to be generated. |
|
|
Sets the path for storing dump graph files. The path can be an absolute path or a relative path of the script execution path. |
|
|
By default, memory reuse is enabled during graph build on the Ascend platform. In fault locating scenarios, if developers suspect that the computation result is abnormal due to memory reuse errors, they can use this environment variable to allocate memory to an operator separately. |
|
|
Set this environment variable if you want to convert a single-operator .json file into an offline model and only use TBE operators during the conversion process (errors are reported if TBE operators cannot be found, without searching for AI CPU operators). |
|
|
Specifies the number of CPU cores available for graph build. |
|
|
Enables or disables the single-thread build during model conversion. |
|
|
In the TensorFlow training scenario, if computational graph build fails, the training process is terminated by default and the remaining graphs are not delivered to the device. You can set this environment variable to make TF Adapter continuously deliver computational graphs to the device without terminating the training process when graph build fails. |
Operator Building
|
Environment Variable |
Description |
|---|---|
|
Parallel build is especially useful when a large network is used. You can set this environment variable to enable parallel build. |
|
|
When the operator build cache function is enabled, you can set this environment variable to limit the disk space of the cache folder in the Ascend AI Processor. |
|
|
Specifies how much (in percentage) of the build cache space is retained when the build cache space of a specified processor reaches ASCEND_MAX_OP_CACHE_SIZE in the scenario where the operator build cache function is enabled. The default value is 50 (%). |
|
|
Specifies whether to skip operator prototype deliverable verification when an operator is inserted to a graph. The deliverables include the implementation of the adaptation functions for operator insertion into the graph, such as shape deduction. |
Resource Configuration
|
Environment Variable |
Description |
|---|---|
|
Specifies the logical ID of the Ascend AI Processor used by the current process. |
|
|
Specifies the devices that are visible to the current process. One or more device IDs can be specified at a time. By using this environment variable, you can adjust the devices without modifying the application. |
|
|
Controls whether to allow operators to transfer data without passing through the L2 cache. |
|
|
Specifies the path for storing the heterogeneous resource description file. |
Operator Execution
|
Environment Variable |
Description |
|---|---|
|
Sets the number of operator information entries cached on the host for an aclnn API. The cached operator information includes the workspace size, operator executor, and tiling information. |
Graph Execution
|
Environment Variable |
Description |
|---|---|
|
During graph execution, enabling the multi-stream concurrency function can improve network performance in certain scenarios. Currently, the multi-stream concurrency function is disabled by default. If you want to enable this function in dynamic shape scenario, you can use this environment variable to enable it. |
|
|
In training and online inference scenarios, this environment variable can be used to enable multi-thread task scheduling of the graph executor (host) for the network in dynamic shape graph mode. |
TFAdapter
|
Environment Variable |
Description |
|---|---|
|
Sets a custom task ID in the TensorFlow training and online inference scenarios. |
|
|
In the TensorFlow 1.15 training scenario, if the input has a dynamic shape, upgrade the control flow operators of the V1 version to those of the V2 version to support the dynamic shape function. Only TensorFlow V2 control flow operators (such as If, Case, While, For, and PartitionedCall) support dynamic shapes. TensorFlow V1 control flow operators (such as Switch, Merge, Enter, LoopCond, NextIteration, Exit, and ControlTrigger) corresponding to the tf.case, tf.cond, and tf.while_loop APIs do not support dynamic shapes. If the network has many branch structures, upgrade the control flow operators of the V1 version to those of the V2 version. Otherwise, the flow of data may exceed the limit. |
|
|
Enables or disables the function of automatically replacing the FP32 data type with the HF32 data type for the TensorFlow 1.15 network. In the current version, this environment variable takes effect only for Conv and Matmul operators. |
|
|
Enables or disables debug logging of the TF Adapter in the TensorFlow 2.6.5 training and online inference scenario. |
|
|
Enables or disables graph dump of the TF Adapter in the TensorFlow 2.6.5 training and online inference scenario. |
|
|
Prints graph time consumption of the TF Adapter in the TensorFlow 2.6.5 training and online inference scenarios. |
|
|
Sets the number of iterations per loop offloaded to the NPU in the TensorFlow 2.6.5 training and online inference scenarios. |
|
|
In the TensorFlow 1.15 training scenario, if the training acceleration function is enabled through the experimental_accelerate_train_mode or accelerate_train_mode parameter, you can use this environment variable to set the number of execution steps on the NPU. |
|
|
In the TensorFlow 1.15 training scenario, if the training acceleration function is enabled through the experimental_accelerate_train_mode or accelerate_train_mode parameter, you can use this environment variable to set the total number of training steps on the NPU. |
|
|
In the TensorFlow 1.15 training scenario, if the training acceleration function is enabled through the experimental_accelerate_train_mode or accelerate_train_mode parameter, you can use this environment variable to set the loss value of the current iteration on the NPU. |
|
|
In the TensorFlow 1.15 training scenario, if the training acceleration function is enabled through the experimental_accelerate_train_mode or accelerate_train_mode parameter, you can use this environment variable to set the target training loss value on the NPU. |
|
|
In the TensorFlow distributed training or inference scenario, this environment variable is used to specify the path and name of the rank table resource configuration file of the Ascend AI Processor that participates in collective communication. |
|
|
Sets the rank ID of the current process in the collective communication process group in the TensorFlow distributed training or inference scenario. |
|
|
Sets the number of devices corresponding to the current training process in the TensorFlow distributed training or inference scenario. |
|
|
In the TensorFlow distributed training scenario, you can choose not to use the rank table file. Instead, you can use the environment variables to automatically generate resource information and initialize the collective communication component. This environment variable is used to configure the listening host IP address of the master node. |
|
|
Configures the listening port of the master node. |
|
|
Configures the logical ID of the device for collecting statistics on the server cluster on the master node. |
|
|
Configures the number of devices in the service communicator. |
|
|
Configures the NIC IP address used for information exchange between the current device and the master node. |
Collective Communication
|
Environment Variable |
Description |
|---|---|
|
Function |
|
|
Configures the timeout wait period of socket connection establishment between different devices in the distributed training or inference scenario. The progress of collective communication initialization varies depending on the device. This environment variable synchronizes the progress of socket establishment between devices by using a timeout interval. |
|
|
During distributed training or inference, tasks executed by different device processes may be inconsistent, for example, only specific processes save the checkpoint data. This environment variable controls the synchronization wait time during task execution between devices. Within this configured time, each device process waits for other devices to perform communication synchronization. |
|
|
Configures the communication algorithms between servers and supernodes. The algorithms can be configured globally or by operator. |
|
|
Sets the size of the shared data buffer used by the communicator. The value must be an integer greater than or equal to 1. The default value is 200. The unit is MB. |
|
|
Specifies whether to use the PCIe link for communication on a server. |
|
|
Specifies whether to use the RoCE link for communication on a server or supernode. |
|
|
Sets the type of the communication link between supernodes in supernode mode. |
|
|
Sets the location for expanding the orchestration of the communication algorithm. |
|
|
When the deterministic computing or order-preserving function is enabled for a reduction operator, the same output is generated if the operator is executed for multiple times with the same hardware and input. |
|
|
For the SuperPoD networking of the |
|
|
Performance |
|
|
Submits the RDMA tasks in PCIe Direct mode in multi-server communication scenarios where the host OS uses non-4 KB memory pages and the communication operator delivery performance encounters the host bound. This helps improve the communication operator delivery performance. |
|
|
Sets the number of queue pairs (QPs) used for data transmission during RDMA communication between two ranks. By default, one QP is created. |
|
|
By default, one queue pair (QP) is created for data transfer during RDMA communication between two ranks. If you want to use multiple QPs for RDMA communication between two ranks and specify the source port numbers used for multi-QP communication, you can use this environment variable. |
|
|
Sets the minimum amount of data shared by each QP during RDMA communication between ranks through multi-QPs. |
|
|
Network |
|
|
Configures the communication IP address used by the host during HCCL initialization when the communicator is created based on root node information. This IP address is used to communicate with the root node to create a communicator. |
|
|
Specifies the start port number of the host NIC when the communicator is created based on root node information. After the configuration, the system uses 32 ports starting from this port by default to collect cluster information. |
|
|
Configures the communication port used by HCCL on the host when the communicator is created based on root node information. |
|
|
Configures the communication port used by HCCL on the NPU when the communicator is created based on root node information. |
|
|
Sets the name of the NIC used by the host during HCCL initialization. HCCL obtains the host IP address based on the NIC name and communicates with the root node to create a communicator. |
|
|
Sets the IP protocol used by the communication NIC. |
|
|
Sets the traffic class of the RDMA NIC. |
|
|
Sets the service level (SL) of the RDMA NIC. The value must be the same as the PFC priority set for the NIC. Otherwise, performance may deteriorate. |
|
|
The minimum retransmission timeout of the RDMA NIC is calculated as follows: 4.096 μs × 2 ^ timeout. In the formula, timeout is the value of this environment variable, and the actual retransmission timeout is related to the user network status. |
|
|
Configures the number of retransmission times of the RDMA NIC. The value must be an integer ranging from 1 to 7. The default value is 7. |
|
|
Debugging |
|
|
Sets whether to cache detailed information about some tasks during collective communication. If a task fails to be executed, detailed logs can be printed for fault locating. |
|
|
Controls whether to print the runtime logs of the communication operator in real time. |
|
|
Configures whether run logs (that is, logs in $HOME/ascend/log/run) contain the detailed running information about the specific HCCL submodule. Currently, the following four configuration items are supported: ALG or alg (algorithm orchestration module), TASK or task (task orchestration module), RESOURCE or resource (resource management module, including resource allocation and release operations), and AIV_OPS_EXC or aiv_ops_exc (AIV operator log printing module, including communication memory operations and resource synchronization operations during operator execution). |
|
|
HCCL provides multiple fault detection functions, including the link setup fault detection time configuration, cluster heartbeat monitoring switch, and process suspension detection switch. After these detection functions are enabledm, the fault information can be quickly located and displayed when a service exception occurs, helping rectify the fault in a timely manner. |
|
|
Reliability |
|
|
Enables or disables the retry feature of the HCCL operator. HCCL operator retry is based on the communicator. If an |
|
|
Configures the wait period for the first retry, the maximum number of retries, and the interval between two retries after the HCCL operator retry feature is enabled through HCCL_OP_RETRY_ENABLE. |
|
|
Security |
|
|
Enables or disables the HCCL communication trustlist. |
|
|
Configures the path of the HCCL communication trustlist configuration file after the communication trustlist verification function is enabled using HCCL_WHITELIST_DISABLE. Only IP addresses in the communication trustlist can be used for collective communication. |
|
AOE Tuning
|
Environment Variable |
Description |
|---|---|
|
Sets the path of the custom repository generated after Auto Tune. |
|
|
Initiates tuning again. This environment variable takes effect only when subgraph tuning or operator tuning is enabled. |
|
|
Sets the AOE tuning mode in online inference and training scenarios. |
AMCT Model Compression
|
Environment Variable |
Description |
|---|---|
|
Sets the level of messages in the amct_pytorch.log file for the PyTorch framework, amct_caffe.log for the Caffe framework, and amct_onnx.log for the ONNX framework, and sets the level of messages in the log file generated by the corresponding quantization layer when the model for accuracy simulation is generated. |
|
|
Sets the level of information displayed on the screen. This environment variable applies only to quantization of the PyTorch framework, Caffe framework, and ONNX network model. |
|
|
Generates or not quantization factors for weights and data. This environment variable applies only to model compression of the MindSpore framework. |
|
|
Controls information such as log flushing during post-training quantization. This environment variable applies only to quantization performed by calling aclgrphCalibration. |
Profile Data Collection
|
Environment Variable |
Description |
|---|---|
|
Enables or disables the profiling function. |
|
|
Sets profiling configuration options in training or online inference scenarios. |
Logs
|
Environment Variable |
Description |
|---|---|
|
Sets the log flush path. |
|
|
Enables or disables log printing. After this function is enabled, logs are not saved in the log file. Instead, the generated logs are directly printed and displayed. |
|
|
Sets the level of application logs and module logs. Only debug logs are supported. |
|
|
Sets the level of each module of app logs. Only debug logs are supported. |
|
|
Enables or disables event logging for applications. |
|
|
Specifies the timeout for flushing app logs from the device to the host. |
|
|
Sets the number of log files of each process stored in the application log directories (plog and device-id) in the |
|
|
Sets the core dump semaphore for trace processing. |
|
|
Specifies the processing mode for log congestion. |
Fault Information Collection
|
Environment Variable |
Description |
|---|---|
|
Sets a path for storing fault information, including dump graphs, abnormal data of AI Core operators, and operator compilation information. The path can be an absolute path or a relative path (the path relative to the location of the executable program or command), on which users must have the read, write, and execute permissions. If the path does not exist, the system automatically creates the directory in the path. |
Environment Variables That Will Be Deprecated in Later Versions
|
Environment Variable |
Description |
|---|---|
|
Configures the memory allocation mode used during network running. |
|
|
During graph build, this environment variable can be set to determine whether to call the host execution function registered by the operator during graph execution to implement the host execution logic and kernel delivery. |