Session Configuration Options
Basic Options
|
Option |
Description |
Application Scenarios |
|---|---|---|
|
graph_run_mode |
Graph run mode.
Configuration example: custom_op.parameter_map["graph_run_mode"].i = 1 |
Training/Online inference |
|
session_device_id |
Logical ID of a device. Setting this option allows you to run different models on multiple devices by executing a single training script. You can create different sessions for different graphs and pass different session_device_id values. Example: config_0 = tf.ConfigProto()
custom_op = config_0.graph_options.rewrite_options.custom_optimizers.add()
custom_op.name = "NpuOptimizer"
custom_op.parameter_map["session_device_id"].i = 0
config_0.graph_options.rewrite_options.remapping = RewriterConfig.OFF
config_0.graph_options.rewrite_options.memory_optimization = RewriterConfig.OFF
with tf.Session(config=config_0) as sess_0:
sess_0.run(...)
config_1 = tf.ConfigProto()
custom_op = config_1.graph_options.rewrite_options.custom_optimizers.add()
custom_op.name = "NpuOptimizer"
custom_op.parameter_map["session_device_id"].i = 1
config_1.graph_options.rewrite_options.remapping = RewriterConfig.OFF
config_1.graph_options.rewrite_options.memory_optimization = RewriterConfig.OFF
with tf.Session(config=config_1) as sess_1:
sess_1.run(...)
config_7 = tf.ConfigProto()
custom_op = config_7.graph_options.rewrite_options.custom_optimizers.add()
custom_op.name = "NpuOptimizer"
custom_op.parameter_map["session_device_id"].i = 7
config_7.graph_options.rewrite_options.remapping = RewriterConfig.OFF
config_7.graph_options.rewrite_options.memory_optimization = RewriterConfig.OFF
with tf.Session(config=config_7) as sess_7:
sess_7.run(...) |
Training/Online inference |
|
deterministic |
Whether to enable deterministic computing. If enabled, the same output is generated if an operator is executed for multiple times with the same hardware and input. The values are as follows:
By default, deterministic computing does not need to be enabled, because it slows down operator execution and affects performance. If it is disabled, the results of multiple executions may be different. This is generally caused by asynchronous multi-thread executions during operator implementation, which changes the accumulation sequence of floating point numbers. However, if the execution results of a model are different for multiple times or the precision needs to be tuned, you can enable deterministic computing to assist model debugging and tuning. Note that if you want a completely definite result, you need to set a definite random seed in the training script to ensure that the random numbers generated in the program are also definite. Example: custom_op.parameter_map["deterministic"].i = 1 |
Training/Online inference |
Memory Management
Dynamic Shape
In the scenario of dynamic dimension size profiles, input_shape, dynamic_dims, and dynamic_node_type must be used together.
|
Option |
Description |
Application Scenarios |
|---|---|---|
|
input_shape |
Input shape. Configuration example: custom_op.parameter_map["input_shape"].s = tf.compat.as_bytes("data:1,1,40,-1;label:1,-1;mask:-1,-1")
In the preceding example, the network model has three inputs: data (1, 1, 40, -1), label (1, -1), and mask (-1, -1). Separate the name and shapes of each input with colons (:). -1 indicates a dynamic dimension, whose size profiles are configured by using dynamic_dims. Notes:
|
Online inference |
|
dynamic_dims |
Input dimension size choices. Separate the dimension sizes by a semicolon (;). The dimension values match to the -1 placeholders in the input_shape argument with ordering preserved, and the number of -1 placeholders equals the number of dimension sizes of each profile. Set at least two dynamic dimension size profiles. The argument of dynamic_dims must match that of input_shape, as failure to do so may lead to an error and system's exit. Example: custom_op.parameter_map["dynamic_dims"].s = tf.compat.as_bytes("20,20,1,1;40,40,2,2;80,60,4,4")
Based on the input_shape information in the preceding example, the supported input shape profiles are as follows:
|
Online inference |
|
dynamic_node_type |
Type of the dynamic input node.
Only one type of dynamic inputs is allowed, dataset or placeholder.
Example:
custom_op.parameter_map["dynamic_node_type"].i = 0 |
Online inference |
|
ac_parallel_enable |
Indicates whether to allow AI CPU operators and AI Core operators to run in parallel in a dynamic shape graph.
In a dynamic shape graph, when this option is enabled, the system automatically identifies AI CPU operators that can be concurrently executed with the AI Core operators in the graph. Operators of different engines are distributed to different flows to implement parallel execution among multiple engines, improving resource utilization and dynamic shape execution performance.
Configuration example: custom_op.parameter_map["ac_parallel_enable"].s = tf.compat.as_bytes("1") |
Training/Online inference |
|
compile_dynamic_mode |
Indicates whether to generalize all input shapes in the graph.
Configuration example: custom_op.parameter_map["compile_dynamic_mode"].b = True Note: This option cannot be used together with input_shape, dynamic_dims, or dynamic_node_type. |
Training/Online inference |
Mixed Computing
|
Option |
Description |
Application Scenarios |
|---|---|---|
|
mix_compile_mode |
Mixed computing
In full offload mode, all compute operators are offloaded to the device. As a supplement to the full offload mode, mixed computing allows certain operators to be executed online within the frontend framework, improving the Ascend AI Processor's adaptability to TensorFlow. Example: custom_op.parameter_map["mix_compile_mode"].b = True |
Training/Online inference |
|
in_out_pair_flag |
Whether to offload operators specified by in_out_pair to Ascend AI Processor in mixed computing scenarios.
Example: custom_op.parameter_map['in_out_pair_flag'].b = False |
Online inference |
|
in_out_pair |
Names of the input-layer and output-layer operators (not) offloaded in mixed computing scenarios. Note that this option supports only one operator configured within the range of [in_nodes, out_nodes]. Example: # Enable mixed computing. custom_op.parameter_map["mix_compile_mode"].b = True # Perform the following configuration: Offload operators within the [in_nodes, out_nodes] range to Ascend AI Processor for execution, and execute other operators in the frontend framework. in_nodes.append('import/conv2d_1/convolution') out_nodes.append('import/conv2d_59/BiasAdd') out_nodes.append('import/conv2d_67/BiasAdd') out_nodes.append('import/conv2d_75/BiasAdd') all_graph_iop.append([in_nodes, out_nodes]) custom_op.parameter_map['in_out_pair'].s = tf.compat.as_bytes(str(all_graph_iop)) # Alternatively, retain operators within the [in_nodes, out_nodes] range for execution in the frontend framework, and offload other operators to Ascend AI Processor for execution. in_nodes.append('import/conv2d_1/convolution') out_nodes.append('import/conv2d_59/BiasAdd') out_nodes.append('import/conv2d_67/BiasAdd') out_nodes.append('import/conv2d_75/BiasAdd') all_graph_iop.append([in_nodes, out_nodes]) custom_op.parameter_map['in_out_pair_flag'].b = False custom_op.parameter_map['in_out_pair'].s = tf.compat.as_bytes(str(all_graph_iop)) |
Online inference |
Debugging
Accuracy Tuning
|
Option |
Description |
Application Scenarios |
|---|---|---|
|
precision_mode |
A string for the operator precision mode.
In the In the online inference scenario, the default value is "force_fp16". Example: custom_op.parameter_map["precision_mode"].s = tf.compat.as_bytes("allow_mix_precision")
NOTE:
|
Training/Online inference |
|
precision_mode_v2 |
A string for the operator precision mode.
Training scenario:
In the online inference scenario, the default value of this option is fp16. Example: custom_op.parameter_map["precision_mode_v2"].s = tf.compat.as_bytes("origin")
NOTE:
|
Training/Online inference |
|
modify_mixlist |
When mixed precision is enabled, you can use this option to specify the path and file name of the blocklist, trustlist, and graylist, and specify the operators that allow precision reduction and those that do not allow precision reduction. You can enable the mixed precision by configuring precision_mode_v2 or precision_mode in the script.
The blocklist, trustlist, and graylist storage files are in JSON format. A configuration example is as follows:
custom_op.parameter_map["modify_mixlist"].s = tf.compat.as_bytes("/home/test/ops_info.json")
You can specify the operator types in ops_info.json as follows. Separate operators with commas (,). {
"black-list": { // Blocklist
"to-remove": [ // Move an operator from the blocklist to the graylist.
"Xlog1py"
],
"to-add": [ // Move an operator from the trustlist or graylist to the blocklist.
"Matmul",
"Cast"
]
},
"white-list": { // Trustlist
"to-remove": [ // Move an operator from the trustlist to the graylist.
"Conv2D"
],
"to-add": [ // Move an operator from the blocklist or graylist to the trustlist.
"Bias"
]
}
}
Note: The operators in the preceding example configuration file are for reference only. The configuration should be based on the actual hardware environment and the built-in tuning policies of the operators. You can query the built-in tuning policy of each operator in mixed precision mode in CANN software installation directory/opp/built-in/op_impl/ai_core/tbe/config/<soc_version>/aic-<soc_version>-ops-info.json. For example: "Conv2D":{
"precision_reduce":{
"flag":"true"
},
|
Training/Online inference |
|
customize_dtypes |
If precision_mode is used to set the global precision mode of a network, precision problems may occur on particular operators. In this case, you can use customize_dtypes to configure the precision mode of these operators, and still compile other operators using the precision mode specified by precision_mode. Note if precision_mode is set to must_keep_origin_dtype, customize_dtypes does not take effect. Set it to the path (including the name of the configuration file), for example, /home/test/customize_dtypes.cfg. Configuration example: custom_op.parameter_map["customize_dtypes"].s = tf.compat.as_bytes("/home/test/customize_dtypes.cfg")
List the names or types of operators whose precision needs customization in the configuration file. Each operator occupies a line, and the operator type must be defined based on Ascend IR. If both operator name and type are configured for an operator, the operator name applies during compilation. The structure of the configuration file is as follows: # By operator name Opname1::InputDtype:dtype1,dtype2,...OutputDtype:dtype1,... Opname2::InputDtype:dtype1,dtype2,...OutputDtype:dtype1,... # By operator type OpType::TypeName1:InputDtype:dtype1,dtype2,...OutputDtype:dtype1,... OpType::TypeName2:InputDtype:dtype1,dtype2,...OutputDtype:dtype1,... Example: # By operator name resnet_v1_50/block1/unit_3/bottleneck_v1/Relu::InputDtype:float16,int8,OutputDtype:float16,int8 # By operator type OpType::Relu:InputDtype:float16,int8,OutputDtype:float16,int8
NOTE:
|
Online inference/Training |
Accuracy comparison
|
Option |
Description |
Application Scenarios |
|---|---|---|
|
enable_dump |
Data dump enable.
NOTE:
Example:
custom_op.parameter_map["enable_dump"].b = True |
Training/Online inference |
|
dump_mode |
Dump mode. The values are as follows:
NOTE:
If this option is set to all, the input data of some operators, such as collective communication operators HcomAllGather and HcomAllReduce, will be modified during execution. Therefore, the system dumps the operator input before operator execution and dumps the operator output after operator execution. In this way, the dumped input and output data of the same operator is flushed to drives separately, and multiple dump files are generated. After parsing the dump files, you can determine whether the data is an input or output based on the file content. Example: custom_op.parameter_map["dump_mode"].s = tf.compat.as_bytes("all") |
Training/Online inference |
|
enable_dump_debug |
Overflow/underflow data collection enable.
NOTE:
Example: custom_op.parameter_map["enable_dump_debug"].b = True |
Training |
|
dump_debug_mode |
Overflow/Underflow detection mode. The values are as follows:
Example: custom_op.parameter_map["dump_debug_mode"].s = tf.compat.as_bytes("all") |
Training |
|
dump_path |
Dump path. This option is required when enable_dump or enable_dump_debug is set to True. Create the specified path in advance in the environment (either container or host) where training is performed. The running user configured during installation must have the read and write permissions on this path. The path can be an absolute path or a path relative to the path where the training script is executed.
Example: custom_op.parameter_map["dump_path"].s = tf.compat.as_bytes("/home/HwHiAiUser/output") |
Training/Online inference |
|
dump_step |
Iterations to dump. Separate multiple iterations using vertical bars (|), for example, 0|5|10. You can also use hyphens (-) to specify the iteration range, for example, 0|3-5|10. If this option is not set, dump data of all iterations is collected. Example: custom_op.parameter_map["dump_step"].s = tf.compat.as_bytes("0|5|10") |
Training |
|
dump_data |
Type of operator content to dump.
In large-scale training scenarios, dumping a large amount of data takes a long time. You can dump the statistics of all operators, identify the operators that may be abnormal based on the statistics, and then dump the input or output data of these abnormal operators. Example: custom_op.parameter_map["dump_data"].s = tf.compat.as_bytes("stats") |
Training/Online inference |
|
dump_layer |
Name of the operator to dump. Multiple operator names are separated by spaces. If this option is not set, all operators are dumped by default. If the input of the specified operator involves the data operator, the data operator information is also dumped. Example: custom_op.parameter_map["dump_layer"].s = tf.compat.as_bytes("nodename1 nodename2 nodename3") |
Training/Online inference |
|
quant_dumpable |
If the TensorFlow network is quantized by the AMCT tool, this option can be used to control whether to collect the dump data before quantization. The default value is 0.
Example: custom_op.parameter_map["quant_dumpable"].s = tf.compat.as_bytes("1")
NOTE:
This option applies only to online inference scenarios. When data dump is enabled, you can set this option to 1 to ensure that the dump data before quantization can be collected. |
Online inference |
|
fusion_switch_file |
Directory of the fusion switch configuration file, including the file name. The value can contain letters, digits, underscores (_), hyphens (-), and periods (.). The built-in graph fusion and UB fusion patterns are enabled by default. You can disable selected fusion patterns in the configuration file. The following is a template of the fusion_switch.cfg configuration file. on indicates that a fusion pattern is enabled, and off indicates that a fusion pattern is disabled. {
"Switch":{
"GraphFusion":{
"RequantFusionPass":"on",
"ConvToFullyConnectionFusionPass":"off",
"SoftmaxFusionPass":"on",
"NotRequantFusionPass":"on",
"SplitConvConcatFusionPass":"on",
"ConvConcatFusionPass":"on",
"MatMulBiasAddFusionPass":"on",
"PoolingFusionPass":"on",
"ZConcatv2dFusionPass":"on",
"ZConcatExt2FusionPass":"on",
"TfMergeSubFusionPass":"on"
},
"UBFusion":{
"TbePool2dQuantFusionPass":"on"
}
}
}
To disable all fusion patterns at a time, refer to this configuration file example. {
"Switch":{
"GraphFusion":{
"ALL":"off"
},
"UBFusion":{
"ALL":"off"
}
}
}
Notes:
Example: custom_op.parameter_map["fusion_switch_file"].s = tf.compat.as_bytes("/home/test/fusion_switch.cfg") |
Training/Online inference |
|
buffer_optimize |
Enables buffer optimization. This is an advanced switch.
Example: custom_op.parameter_map["buffer_optimize"].s = tf.compat.as_bytes("l2_optimize") |
Online inference |
|
use_off_line |
Enable training on the Ascend AI Processor.
Example: custom_op.parameter_map["use_off_line"].b = True |
Training/Online inference |
Performance Tuning
- Basic configuration
Option
Description
Application Scenarios
iterations_per_loop
Number of iterations per loop set by using set_iteration_per_loop in sess.run mode, that is, the number of iterations per training loop every sess.run() call on the device side.
The value must be the same as that of iterations_per_loop set by set_iteration_per_loop for function verification.
Example:
custom_op.parameter_map["iterations_per_loop"].i = 10
Training
- Advanced setting
Option
Description
Application Scenarios
hcom_parallel
Enables AllReduce gradient update and forward and backward propagation in parallel during distributed training.
- True (default): enabled.
- False: disabled.
For a small network (for example, ResNet-18), you are advised to set this option to False.
Example:
custom_op.parameter_map["hcom_parallel"].b = True
Training
enable_small_channel
Small channel optimization enable. If it is enabled, performance benefits are yielded at the convolutional layers with channel size <= 4.
- 0: disabled. This function is disabled by default in the training scenario (graph_run_mode is 1). You are advised not to enable this function in the training scenario.
- 1: enabled. This is the default option that cannot be modified for the online inference scenario (graph_run_mode is 0).
NOTE:
After this function is included, performance benefits can be obtained on the GoogleNet, ResNet-50, ResNet-101, and ResNet-152 networks. For other networks, the performance may deteriorate.
Example:
custom_op.parameter_map["enable_small_channel"].i = 1
Online inference/Training
op_precision_mode
High-precision or high-performance mode of an operator. You can pass a custom mode configuration file op_precision.ini to set different modes for operators.
You can set this option by operator type (low priority) or node name (high priority). Example:[ByOpType] optype1=high_precision optype2=high_performance optype4=support_out_of_bound_index [ByNodeName] nodename1=high_precision nodename2=high_performance nodename4=support_out_of_bound_index
- high_precision
- high_performance
- support_out_of_bound_index: indicates that the out-of-bounds verification is performed on the indices of the gather, scatter, and segment operators. The verification deteriorates the operator execution performance.
- keep_fp16: The FP16 data type is used for internal processing of operators. In this scenario, the FP16 data type is not automatically converted to the FP32 data type. If the performance of FP32 computation does not meet the expectation and high precision is not required, you can select the keep_fp16 mode. This low-precision mode sacrifices the precision for improving the performance, which is not recommended.
- super_performance: indicates ultra-high performance. Compared with high performance, the algorithm calculation formula is optimized.
You can view the precision or performance mode supported by an operator in the opp/built-in/op_impl/ai_core/tbe/impl_mode/all_ops_impl_mode.ini file in the file storage path with the CANN software installed.
This option is mutually exclusive with op_select_implmode and optypelist_for_implmode. If they are all specified, op_precision_mode takes precedence.
Generally, you do not need to set this option. It is used if you need to adjust the precision of a specific operator using the configuration .ini file in the case that you fail to obtain optimal network performance or accuracy in the high-performance or high-precision mode.
Example:
custom_op.parameter_map["op_precision_mode"].s = tf.compat.as_bytes("/home/test/op_precision.ini")Training/Online inference
enable_scope_fusion_passes
Scope fusion pattern (or scope fusion patterns separated by commas) to take effect at compilation. Name of the registered fusion pattern. You can pass multiple names. Separate the names by commas (,).
Scope fusion patterns (either built-in or custom) are classified into the following two types:
- General: common scope fusion patterns applicable to all networks. They are enabled by default and cannot be manually invalidated.
- Non-general scope fusion patterns: applicable to specific networks. By default, they are disabled. You can use enable_scope_fusion_passes to enable selected fusion patterns.
Example:
custom_op.parameter_map["enable_scope_fusion_passes"].s = tf.compat.as_bytes("ScopeLayerNormPass,ScopeClipBoxesPass")Training/Online inference
stream_max_parallel_num
This option applies only to neural machine translation (NMT) networks.
It specifies the parallelism degree of the AI CPU/AI Core engine to implement parallel execution between AI CPU/AI Core operators.
DNN_VM_AICPU is the name of the AI CPU engine. In this example, the number of concurrent tasks on the AI CPU engine is 10.
AIcoreEngine is the name of the AI Core engine. In this example, the number of concurrent tasks on the AI Core engine is 1.
Defaults to 1. The value cannot exceed the maximum number of AI Cores.
Example:
custom_op.parameter_map["stream_max_parallel_num"].s = tf.compat.as_bytes("DNN_VM_AICPU:10,AIcoreEngine:1")Training/Online inference
is_tailing_optimization
This option applies only to Bidirectional Encoder Representations from Transformers (BERT) networks.
Communication tailing optimization enable in distributed training scenarios to improve performance. By changing a computation dependency relationship, a computation operation that does not depend on the last AR (gradient aggregation fragment) is scheduled to be performed in parallel with the last AR, to optimize communication tailing. Value:
- True: enabled.
- False (default): disabled.
This option must work with NPUOptimizer and the value must be the same as that of is_tailing_optimization in NPUOptimizer.
Example:
custom_op.parameter_map["is_tailing_optimization"].b = True
Training
variable_placement
If the network weight is large, network execution may fail due to insufficient device memory. In this case, you can deploy the variable to the host to reduce the memory usage of the device.- Device: The variable is deployed on the device.
- Host: The variable is deployed on the host.
Default value: Device
Constraints:- If this configuration option is set to Host, mixed computing must be enabled (mix_compile_mode = True).
- If the training script contains APIs of TensorFlow V1 control flow operators, such as tf.case, tf.cond, and tf.while_loop, setting variable_placement to Host may cause the network execution to fail. To avoid this problem, add the following APIs to the training script to convert the control flow operators of TensorFlow V1 to V2 and enable resource variables:
tf.enable_control_flow_v2() tf.enable_resource_variables()
Example:
custom_op.parameter_map["variable_placement"].s = tf.compat.as_bytes("Device")Training/Online inference
frozen_variable
To save the weight as a checkpoint, you can use this option to convert the variable to constant to reduce data copies between the host and device and improve inference performance.- True: conversion enabled.
- False: conversion disabled.
Default value: False
Example:
custom_op.parameter_map["frozen_variable"].b = True
Online inference
graph_max_parallel_model_num
In the online inference scenario, you can set this option to specify the maximum number of threads for parallel graph execution. If the value of this option is greater than 1, the corresponding number of threads are started for parallel graph execution, improving the overall graph execution efficiency.
The value must be an integer in the range of [1, INT32_MAX]. The default value is 1. INT32_MAX is the maximum value of the INT32 type, which is 2147483647.
Example:
custom_op.parameter_map["graph_max_parallel_model_num"].i = 4
Online inference
Profiling
AOE
Operator Compilation
|
Option |
Description |
Application Scenarios |
|---|---|---|
|
op_compiler_cache_mode |
Disk cache mode for operator compilation. enable is the default value.
Notes:
Example: custom_op.parameter_map["op_compiler_cache_mode"].s = tf.compat.as_bytes("enable") |
Training/Online inference |
|
op_compiler_cache_dir |
Disk cache directory for operator compilation. The value can contain letters, digits, underscores (_), hyphens (-), and periods (.). If the specified directory exists and is valid, the kernel_cache subdirectory is automatically created. If the specified directory does not exist but is valid, the system automatically creates this directory and the kernel_cache subdirectory. The storage priority of the operator compilation cache files is as follows: op_compiler_cache_dir -> ${ASCEND_CACHE_PATH}/kernel_cache_host ID -> the default path ($HOME/atc_data) For details about ASCEND_CACHE_PATH, see Installation and Configuration > Flush File Configuration in Environment Variables. Example: custom_op.parameter_map["op_compiler_cache_dir"].s = tf.compat.as_bytes("/home/test/kernel_cache") |
Training/Online inference |
Data Augmentation
|
Option |
Description |
Application Scenarios |
|---|---|---|
|
local_rank_id |
Rank ID of the current process, used in data parallel processing. The main process deduplicates the data and distributes the deduplicated data to the devices of other processes for forward and backward propagation.
In this mode, multiple devices on a host share one main process for data preprocessing, leaving other processes to receive preprocessed data from the main process. To identify the main process, call the collective communication API get_local_rank_id() to get the rank ID of the current process on its server. Example: custom_op.parameter_map["local_rank_id"].i = 0 |
Training/Online inference |
|
local_device_list |
Devices that the main process sends data to, used in conjunction with local_rank_id. custom_op.parameter_map["local_device_list"].s = tf.compat.as_bytes("0,1") |
Training/Online inference |
Exception Remedy
|
Option |
Description |
Application Scenarios |
|---|---|---|
|
hccl_timeout |
Timeout interval (s) of collective communication. Defaults to 1836. You can set the timeout interval if the default value does not meet your requirement (for example, when a communication failure occurs).
NOTE:
Example: custom_op.parameter_map["hccl_timeout"].i = 1800 |
Training |
|
op_wait_timeout |
Operator wait timeout interval (s). Defaults to 120. You can set the timeout interval if the default value does not meet your requirement. Configuration example: custom_op.parameter_map["op_wait_timeout"].i = 120 |
Training |
|
op_execute_timeout |
Operator execution timeout interval (s). Example: custom_op.parameter_map["op_execute_timeout"].i = 90 |
Training |
|
stream_sync_timeout |
Timeout interval for stream synchronization during graph execution. If the timeout interval exceeds the configured value, a synchronization failure is reported. The unit is ms. The default value is -1, indicating that there is no waiting time and no error is reported when the synchronization fails. Note: In the cluster training scenario, the value of this option (timeout interval for stream synchronization) must be greater than the collection communication timeout interval, that is, the value of hccl_timeout or the value of the environment variable HCCL_EXEC_TIMEOUT. Example: custom_op.parameter_map["stream_sync_timeout"].i = 60000 |
Training |
|
event_sync_timeout |
Timeout interval for event synchronization during graph execution. If the timeout interval exceeds the configured value, a synchronization failure is reported. The unit is ms. The default value is -1, indicating that there is no waiting time and no error is reported when the synchronization fails. Configuration example: custom_op.parameter_map["event_sync_timeout"].i = 60000 |
Training |
Experiment Options
The experiment options are extended options for debugging and may be changed in later versions. Therefore, they cannot be used in commercial products.
|
Option |
Description |
Application Scenarios |
|---|---|---|
|
jit_compile |
Determines whether to compile the operator online or use the compiled operator binary file.
Default value: auto
NOTICE:
This option is used only for networks of large recommendation models. Example: custom_op.parameter_map["jit_compile"].s = tf.compat.as_bytes( "auto") |
Training/Online inference |
|
experimental_accelerate_train_mode |
If training takes more than one hour, you can trigger training acceleration to improve training performance by configuring this option. The software compiles and runs the corresponding proportion of training processes with reduced precision based on the acceleration type, mode of triggering acceleration, and proportion of low-precision training processes of your configurations. The remaining training processes are compiled and run based on the original precision.
The value type of this option is a string. Three fields are separated by vertical bars (|), for example, fast|step|0.9.
Example:
Notes:
|
Training |
Options That Will Be Deprecated in Later Versions
The following options will be deprecated in later versions. You are advised not to use them anymore.
|
Option |
Description |
Application Scenarios |
|---|---|---|
|
op_debug_level |
Function debugging. Whether to enable operator debugging. The values are as follows:
This option is left empty by default, indicating that the configuration is disabled. Example: custom_op.parameter_map["op_debug_level"].i = 0 |
Training/Online inference |
|
enable_data_pre_proc |
Performance tuning. Enable for the GetNext operator offload to the Ascend AI Processor. The GetNext operator offload is a prerequisite for iteration offload.
Example:
custom_op.parameter_map["enable_data_pre_proc"].b = True |
Training |
|
variable_format_optimize |
Performance tuning. Variable format optimization enable.
If it is enabled, the variables are reformatted during network variable initialization to better target to Ascend AI Processor (for example, from NCHW to NC1HWC0) for improved training efficiency. Enable or disable this function as needed. This option is left empty by default, indicating that the configuration is disabled. Example: custom_op.parameter_map["variable_format_optimize"].b = True |
Training |
|
op_select_implmode |
Performance tuning. Operator implementation mode select. Certain operators compiled in the Ascend AI Processor can be implemented in either high-precision or high-performance mode at model compile time. Arguments:
This option is left empty by default, indicating that the configuration is disabled. Example: custom_op.parameter_map["op_select_implmode"].s = tf.compat.as_bytes("high_precision") |
Training/Online inference |
|
optypelist_for_implmode |
Performance tuning. List of operator types (separated by commas) that use the mode specified by the op_select_implmode option. Currently, Pooling, SoftmaxV2, LRN, and ROIAlign operators are supported. Use this option in conjunction with op_select_implmode, for example: Set op_select_implmode to high_precision. Set optypelist_for_implmode to Pooling. This option is left empty by default, indicating that the configuration is disabled. Example: custom_op.parameter_map["optypelist_for_implmode"].s = tf.compat.as_bytes("Pooling,SoftmaxV2") |
Training/Online inference |
|
dynamic_input |
Whether it is a dynamic input.
Example: custom_op.parameter_map["dynamic_input"].b = True |
Training/Online inference |
|
dynamic_graph_execute_mode |
Execution mode of a dynamic input. That is, this option takes effect when dynamic_input is set to True. Possible values are: dynamic_execute: dynamic graph compilation. In this mode, the shape range configured in dynamic_inputs_shape_range is used for compilation. Example: custom_op.parameter_map["dynamic_graph_execute_mode"].s = tf.compat.as_bytes("dynamic_execute") |
Training/Online inference |
|
dynamic_inputs_shape_range |
Shape range of each dynamic input. If a graph has two dataset inputs and one placeholder input, a configuration example is as follows: custom_op.parameter_map["dynamic_inputs_shape_range"].s = tf.compat.as_bytes("getnext:[128 ,3~5, 2~128, -1],[64 ,3~5, 2~128, -1];data:[128 ,3~5, 2~128, -1]")
Precautions:
|
Training/Online inference |
|
graph_memory_max_size |
Sizes of the network static memory and the maximum dynamic memory (used in earlier versions). In the current version, this option does not take effect. The system dynamically allocates memory resources based on the actual memory usage of the network. |
Training/Online inference |
|
variable_memory_max_size |
Size of the variable memory (used in earlier versions). In the current version, this option does not take effect. The system dynamically allocates memory resources based on the actual memory usage of the network. |
Training/Online inference |
