Environment Variable Configuration
Table 1 describes the environment variables of Rec SDK Torch.
Environment Variable |
Meaning |
Mandatory/Optional |
Description |
|---|---|---|---|
INPUT_DIST_THREADS |
Number of concurrent threads in the thread pool used by Rec SDK Torch to execute bucketing tasks. |
Optional |
The value is an integer ranging from 1 to 12. The default value is 6. |
POST_INPUT_THREADS |
Number of concurrent threads in the thread pool used by Rec SDK Torch to execute hash deduplication tasks. |
Optional |
The value is an integer ranging from 1 to 12. The default value is 6. |
MASTER_ADDR |
IP address of the master node in distributed training. |
Optional |
IPv4 address. 127.0.0.1 is recommended. |
MASTER_PORT |
Listening port number in distributed training. |
Optional |
The value is an integer ranging from 0 to 65520. |
LOCAL_RANK |
NPU ID of the current process on the local host. |
Optional |
The value is an integer ranging from 0 to world_size – 1. |
WORLD_SIZE |
Number of devices involved in training. |
Optional |
The value is an integer ranging from 1 to 8. |
ASCEND_VISIBLE_DEVICES |
Devices visible to the Ascend AI Processor, which is used to specify that the program uses only some of devices. |
Mandatory |
You can use this environment variable to specify the NPU device for training. (Run the ls /dev/ | grep davinci* command to query the NPU device of the host.) In addition, you can use the device serial number to specify the NPU device. A single NPU device or a range of NPU devices and use them together. Example:
|
ASCEND_OPP_PATH |
Root directory of the operator library. |
Mandatory |
Set this parameter when running the CANN environment variable configuration script. You are advised not to change the value. The default value is /usr/local/Ascend/cann/opp. |
GLOO_SOCKET_IFNAME |
NIC configuration for gloo communication. |
Optional |
Run the ifconfig or ip a command to view the NIC name of the server. The recommended value is lo. |
ENABLE_FAST_HASHMAP |
Whether to enable the quick hash table. |
Optional |
The value is a character string. The value true, yes, or 1 indicates that the function is enabled. Other values indicate that the function is disabled. The default value is false. |
EMB_MEMORY_POOL_SIZE |
Size of the embedding memory pool of the quick hash table. |
Optional |
The value is an integer. The value ranges from [1, 200000]. The default value is 102400. |
FAST_HASHMAP_RESERVE_BUCKET_NUM |
Number of reserved buckets in the quick hash table. |
Optional |
The value is an integer. The value ranges from [128, 4294967291]. The default value is 2097152. |
EMB_MEMORY_POOL_THREAD_NUM |
Number of processing threads in the embedding memory pool of the quick hash table. |
Optional |
The value is an integer. The value ranges from [1, 1024]. The default value is 4. |
EMBCACHE_SIZE_ON_DEVICE_MEM |
On-chip memory embedding cache size (unit: byte). |
Optional |
The value is an integer. The value ranges from [1, Available device memory]. The default value is 17179869184 (16 GB). |
DO_EC_LOCAL_UNIQUE |
Whether to enable EC local unique for multi-level cache. |
Optional |
The value is a character string. The value true, 1, or yes indicates that the function is enabled, and other values indicate that the function is disabled. The default value is false. |
LOCAL_UNIQUE_PARALLEL_BATCH_NUM |
Number of batches for parallel processing of local unique in EmbCacheTrainPipelineSparseDist |
Optional |
The value is an integer. The value ranges from 1 to 24. The default value is 2. |
ENABLE_PARALLEL_GLOBAL_UNIQUE |
Whether to enable parallel global unique processing. |
Optional |
The value is a character string. The value 1 indicates that the function is enabled, and other values indicate that the function is disabled. The default value is 0, indicating that the function is disabled. |
GLOG_stderrthreshold |
Sets the log level of the multi-level cache C++ module. |
Optional |
The value is an integer. The default value is 0. Value range:
|