Configuration Options
Basic Options
Option |
Description |
|---|---|
graph_run_mode |
Graph run mode.
Example: npu.global_options().graph_run_mode=1 |
deterministic |
Whether to enable deterministic computing. If enabled, the same output is generated when an operator is executed for multiple times with the same hardware and input. The values are as follows:
By default, deterministic computing does not need to be enabled, because it slows down operator execution and affects performance. If it is disabled, the results of multiple executions may be different. This is generally caused by asynchronous multi-thread executions during operator implementation, which changes the accumulation sequence of floating point numbers. However, if the execution results of a model are different for multiple times or the precision needs to be tuned, you can enable deterministic computing to assist model debugging and tuning. Note that if you want a completely definite result, you need to set a definite random seed in the training script to ensure that the random numbers generated in the program are also definite. Example: npu.global_options().deterministic=1 |
Memory Management
Dynamic Shape
Option |
Description |
|---|---|
ac_parallel_enable |
Indicates whether to allow AI CPU operators and AI Core operators to run in parallel in a dynamic shape graph. In a dynamic shape graph, when this option is enabled, the system automatically identifies AI CPU operators that can be concurrently executed with the AI Core operators in the graph. Operators of different engines are distributed to different flows to implement parallel execution among multiple engines, improving resource utilization and dynamic shape execution performance.
Example: npu.global_options().ac_parallel_enable="1" |
compile_dynamic_mode |
Indicates whether to generalize all input shapes in the graph.
Example: npu.global_options().compile_dynamic_mode=True |
Debugging
Accuracy Tuning
Option |
Description |
|---|---|
precision_mode |
A string for the operator precision mode.
For the Example: npu.global_options().precision_mode="allow_mix_precision" NOTE:
|
precision_mode_v2 |
A string for the operator precision mode.
Default value:
Example: npu.global_options().precision_mode_v2="origin" NOTE:
|
modify_mixlist |
When mixed precision is enabled, you can use this option to specify the path and file name of the blocklist, trustlist, and graylist, and specify the operators that allow precision reduction and those that do not allow precision reduction. You can enable the mixed precision by configuring precision_mode_v2 or precision_mode in the script. The blocklist, trustlist, and graylist storage files are in JSON format. A configuration example is as follows:
npu.global_options().modify_mixlist="/home/test/ops_info.json" Specify the operator type (or types separated by commas) in ops_info.json as follows. {
"black-list": { // Blocklist
"to-remove": [ // Move an operator from the blocklist to the graylist.
"Xlog1py"
],
"to-add": [ // Move an operator from the trustlist or graylist to the blocklist.
"Matmul",
"Cast"
]
},
"white-list": { // Trustlist
"to-remove": [ // Move an operator from the trustlist to the graylist.
"Conv2D"
],
"to-add": [ // Move an operator from the blocklist or graylist to the trustlist.
"Bias"
]
}
}
Note: The operators in the preceding example configuration file are for reference only. The configuration should be based on the actual hardware environment and the built-in tuning policies of the operators. You can query the built-in tuning policy of each operator in mixed precision mode in CANN software installation directory/opp/built-in/op_impl/ai_core/tbe/config/<soc_version>/aic-<soc_version>-ops-info.json. For example: "Conv2D":{
"precision_reduce":{
"flag":"true"
},
|
customize_dtypes |
If precision_mode is used to set the global precision mode of a network, precision problems may occur on particular operators. In this case, you can use customize_dtypes to configure the precision mode of these operators, and still compile other operators using the precision mode specified by precision_mode. Note if precision_mode is set to must_keep_origin_dtype, customize_dtypes does not take effect. Set it to the path (including the name of the configuration file), for example, /home/test/customize_dtypes.cfg. Example: npu.global_options().customize_dtypes = "/home/test/customize_dtypes.cfg" List the names or types of operators whose accuracy needs customization in the configuration file. Each operator occupies a line, and the operator type must be defined based on Ascend IR. If both operator name and type are configured for an operator, the operator name applies during building. The structure of the configuration file is as follows: # By operator name Opname1::InputDtype:dtype1,dtype2,...OutputDtype:dtype1,... Opname2::InputDtype:dtype1,dtype2,...OutputDtype:dtype1,... # By operator type OpType::TypeName1:InputDtype:dtype1,dtype2,...OutputDtype:dtype1,... OpType::TypeName2:InputDtype:dtype1,dtype2,...OutputDtype:dtype1,... Example: # By operator name resnet_v1_50/block1/unit_3/bottleneck_v1/Relu::InputDtype:float16,int8,OutputDtype:float16,int8 # By operator type OpType::Relu:InputDtype:float16,int8,OutputDtype:float16,int8 NOTE:
|
Accuracy Comparison
Option |
Description |
|---|---|
fusion_switch_file |
Directory of the fusion switch configuration file, including the file name. The value can contain letters, digits, underscores (_), hyphens (-), and periods (.). The built-in graph fusion and UB fusion patterns are enabled by default. You can disable selected fusion patterns in the configuration file. Example:
npu.global_options().fusion_switch_file="/home/test/fusion_switch.cfg" The following is a template of the fusion_switch.cfg fusion pattern configuration file. on indicates that a fusion pattern is enabled, whereas off indicates that a fusion pattern is disabled. {
"Switch":{
"GraphFusion":{
"RequantFusionPass":"on",
"ConvToFullyConnectionFusionPass":"off",
"SoftmaxFusionPass":"on",
"NotRequantFusionPass":"on",
"ConvConcatFusionPass":"on",
"MatMulBiasAddFusionPass":"on",
"PoolingFusionPass":"on",
"ZConcatv2dFusionPass":"on",
"ZConcatExt2FusionPass":"on",
"TfMergeSubFusionPass":"on"
},
"UBFusion":{
"TbePool2dQuantFusionPass":"on"
}
}
}
To disable all fusion patterns at a time, refer to this configuration file example. {
"Switch":{
"GraphFusion":{
"ALL":"off"
},
"UBFusion":{
"ALL":"off"
}
}
}
Notes:
|
dump_config.enable_dump |
Data dump enable.
Example:
npu.global_options().dump_config.enable_dump=True |
dump_config.dump_path |
Dump path. Required when enable_dump or enable_dump_debug is set to True. Create the specified path in advance in the environment (either container or host) where training is performed. The running user configured during installation must have the read and write permissions on this path. The path can be an absolute path or a path relative to the path where the training script is executed.
Example:
npu.global_options().dump_config.dump_path = "/home/HwHiAiUser/output" |
dump_config.dump_step |
Iterations to dump. Separate multiple iterations using vertical bars (|), for example, 0|5|10. You can also use hyphens (-) to specify the iteration range, for example, 0|3-5|10. If this option is not set, dump data of all iterations is collected. Example:
npu.global_options().dump_config.dump_step="0|5" |
dump_config.dump_mode |
Data dump mode. The values are as follows:
NOTE:
If this option is set to all, the input data of some operators, such as collective communication operators HcomAllGather and HcomAllReduce, will be modified during execution. Therefore, the system dumps the operator input before operator execution and dumps the operator output after operator execution. In this way, the dumped input and output data of the same operator is flushed to drives separately, and multiple dump files are generated. After parsing the dump files, you can determine whether the data is an input or output based on the file content. Example:
npu.global_options().dump_config.dump_mode="all" |
dump_config.dump_data |
Type of operator content to dump.
In large-scale training scenarios, dumping a large amount of data takes a long time. You can dump the statistics of all operators, identify the operators that may be abnormal based on the statistics, and then dump the input or output data of these abnormal operators. Example: npu.global_options().dump_config.dump_data = "stats" |
dump_config.dump_layer |
Name of the operator to dump. Multiple operator names are separated by spaces. If this option is not set, all operators are dumped by default. If the input of the specified operator involves the data operator, the data operator information is also dumped. Example: npu.global_options().dump_config.dump_layer = "nodename1 nodename2 nodename3" |
dump_config.enable_dump_debug |
Overflow/underflow data collection enable.
NOTE:
Example:
npu.global_options().dump_config.enable_dump_debug=True |
dump_config.dump_debug_mode |
Overflow/Underflow detection mode. The values are as follows:
Example:
npu.global_options().dump_config.dump_debug_mode="aicore_overflow" |
quant_dumpable |
If the TensorFlow network is quantized by the AMCT tool, this option can be used to control whether to collect the dump data before quantization. The default value is 0.
Example: npu.global_options().quant_dumpable="1" NOTE:
This option applies only to online inference scenarios. When data dump is enabled, you can set this option to 1 to ensure that the dump data before quantization can be collected. |
Performance Tuning
Option |
Description |
|---|---|
hcom_parallel |
Enable for the AllReduce gradient update and forward and backward propagation in parallel.
Example: npu.global_options().hcom_parallel=True For a small network (for example, ResNet-18), you are advised to set this option to False. |
enable_small_channel |
Small channel optimization enable. If it is enabled, performance benefits are yielded at the convolutional layers with channel size <= 4.
Example: npu.global_options().enable_small_channel=1 |
op_precision_mode |
High-precision or high-performance mode of an operator. You can pass a custom mode configuration file op_precision.ini to set different modes for operators. You can set this option by operator type (low priority) or node name (high priority). Example: [ByOpType] optype1=high_precision optype2=high_performance optype4=support_out_of_bound_index [ByNodeName] nodename1=high_precision nodename2=high_performance nodename4=support_out_of_bound_index
You can view the precision and performance mode supported by an operator in the opp/built-in/op_impl/ai_core/tbe/impl_mode/all_ops_impl_mode.ini file of the CANN component directory. This option is mutually exclusive with op_select_implmode and optypelist_for_implmode. If they are all specified, op_precision_mode takes precedence. Generally, you do not need to set this option. It is used if you need to adjust the precision of a specific operator using the configuration .ini file in the case that you fail to obtain optimal network performance or accuracy in the high-performance or high-precision mode. Example: npu.global_options().op_precision_mode="/home/test/op_precision.ini" |
stream_max_parallel_num |
This option applies only to neural machine translation (NMT) networks. Degree of parallelism of AI CPU and AI Core engines for parallel execution of AI CPU and AI Core operators. DNN_VM_AICPU is the name of the AI CPU engine. In this example, the number of concurrent tasks on the AI CPU engine is 10. AIcoreEngine is the name of the AI Core engine. In this example, the number of concurrent tasks on the AI Core engine is 1. The value range is [1, 13]. Defaults to 1. Example: npu.global_options().stream_max_parallel_num="DNN_VM_AICPU:10,AIcoreEngine:1" |
is_tailing_optimization |
This option applies only to Bidirectional Encoder Representations from Transformers (BERT) networks. Communication tailing optimization enable in distributed training scenarios to improve performance. By changing a computation dependency relationship, a computation operation that does not depend on the last AR (gradient aggregation fragment) is scheduled to be performed in parallel with the last AR, to optimize communication tailing.
Example: npu.global_options().is_tailing_optimization=True |
enable_scope_fusion_passes |
Fusion pattern (or fusion patterns separated by commas) to take effect at build time. Scope fusion patterns (either built-in or custom) are classified into the following two types:
Example: npu.global_options().enable_scope_fusion_passes="ScopeLayerNormPass,ScopeClipBoxesPass" |
Profiling
AOE
Operator Building
Option |
Description |
|---|---|
op_compiler_cache_mode |
Disk cache mode for operator building. enable is the default value.
Notes:
Example:
npu.global_options().op_compiler_cache_mode="enable" |
op_compiler_cache_dir |
Disk cache directory for operator compilation. The directory can contain letters, digits, underscores (_), hyphens (-), and periods (.). If the specified directory exists and is valid, a kernel_cache subdirectory is automatically created. If the specified directory does not exist but is valid, the system automatically creates this directory and the kernel_cache subdirectory. The storage priority of the operator compilation cache files is as follows: op_compiler_cache_dir > ${ASCEND_CACHE_PATH}/kernel_cache_host ID > Default path ($HOME/atc_data) For details about ASCEND_CACHE_PATH, see Environment Variables. Example:
npu.global_options().op_compiler_cache_dir="/home/test/kernel_cache" |
Exception Remedy
Experiment Options
The experiment options are extended options for debugging and may be changed in later versions. Therefore, they cannot be used in commercial products.
Option |
Description |
|---|---|
jit_compile |
Online compilation enable for model compilation.
Default value: auto Example: npu.global_options().jit_compile = "auto" NOTICE:
This option is used only for networks of large recommendation models. |
Options That Will Be Deprecated in Later Versions
Option |
Description |
|---|---|
op_select_implmode |
Operator implementation mode select. The operators built in the Ascend AI Processor can be implemented in either high-precision or high-performance mode. The value can be set to either of the following:
The default value is None, indicating that the configuration is disabled. Example:
npu.global_options().op_select_implmode="high_precision" |
optypelist_for_implmode |
List of operator types (separated by commas) that use the mode specified by the op_select_implmode option. Currently, Pooling, SoftmaxV2, LRN, and ROIAlign operators are supported. Use this option in conjunction with op_select_implmode, for example: npu.global_options().op_select_implmode="high_precision" npu.global_options().optypelist_for_implmode="Pooling,SoftmaxV2" The default value is None, indicating that the configuration is disabled. |
variable_format_optimize |
Variable format optimization enable.
If enabled, reformats the variables during network variable initialization to better target to Ascend AI Processor for improved training efficiency. Enable or disable this function as needed. The default value is None, indicating that the configuration is disabled. Example: npu.global_options().variable_format_optimize=True |
op_debug_level |
Operator debug enable.
The default value is None, indicating that the configuration is disabled. Example: npu.global_options().op_debug_level=0 |
graph_memory_max_size |
Sizes of the network static memory and the maximum dynamic memory (used in earlier versions). In the current version, this option does not take effect. The system dynamically allocates memory resources based on the actual memory usage of the network. |
variable_memory_max_size |
Size of the variable memory (used in earlier versions). In the current version, this option does not take effect. The system dynamically allocates memory resources based on the actual memory usage of the network. |