NPURunConfig Parameters
Basic Options
|
Option |
Description |
|---|---|
|
graph_run_mode |
Graph run mode. Values are as follows:
Example: config = NPURunConfig(graph_run_mode=1) |
|
session_device_id |
Logical ID of a device. Setting this parameter allows you to run different models on multiple devices by executing a single training script. Generally, you can create sessions for multiple graphs and pass the corresponding argument of session_device_id to the session. This parameter takes precedence over the environment variable ASCEND_DEVICE_ID. Example: config0 = NPURunConfig(..., session_device_id=0, ...) estimator0 = NPUEstimator(..., config=config0, ...) ... config1 = NPURunConfig(..., session_device_id=1, ...) estimator1 = NPUEstimator(..., config=config1, ...) ... config7 = NPURunConfig(..., session_device_id=7, ...) estimator7 = NPUEstimator(..., config=config7, ...) ... |
|
distribute |
ParameterServerStrategy object for distributed training in the PS-Worker architecture. Example: config = NPURunConfig(distribute=strategy) |
|
deterministic |
Whether to enable deterministic computing. If enabled, the same output is generated if an operator is executed for multiple times with the same hardware and input. The values are as follows:
By default, deterministic computing does not need to be enabled, because it slows down operator execution and affects performance. If it is disabled, the results of multiple executions may be different. This is generally caused by asynchronous multi-thread executions during operator implementation, which changes the accumulation sequence of floating point numbers. However, if the execution results of a model are different for multiple times or the precision needs to be tuned, you can enable deterministic computing to assist model debugging and tuning. Note that if you want a completely definite result, you need to set a definite random seed in the training script to ensure that the random numbers generated in the program are also definite. Example: config = NPURunConfig(deterministic=1) |
Memory Management
|
Option |
Description |
|---|---|
|
memory_config |
System memory usage mode. Before creating NPURunConfig, you can instantiate a MemoryConfig class to configure functions. For details about the constructor of the MemoryConfig class, see MemoryConfig Constructor. |
|
external_weight |
When multiple models are loaded in a session, if the weights of these models can be reused, you are advised to use this configuration item to externalize the weights of the Const/Constant nodes on the network to implement weight reuse among multiple models and reduce the memory usage of the weights.
Note: This parameter is usually not required. If the model loading environment has limitations on memory, you can flush the weight externally.
Example:
config = NPURunConfig(external_weight=True) |
|
input_fusion_size |
Threshold for fusing and copying multiple discrete pieces of user input data during H2D transmission. The unit is byte. The minimum value is 0 byte, the maximum value is 33554432 bytes (32 MB), and the default value is 131072 bytes (128 KB). If:
Assume there are 10 user inputs, including two 100 KB inputs, two 50 KB inputs, and the other inputs greater than 100 KB:
Note: This parameter takes effect only for static shape graphs. Example: config = NPURunConfig(input_fusion_size=25600) |
|
input_batch_cpy |
Whether to enable the batch memory copy function when input data is transferred from the host to the device.
NOTE:
Example: config = NPURunConfig(input_batch_cpy=True) |
Dynamic Shape
|
Option |
Description |
|---|---|
|
ac_parallel_enable |
Whether to allow AI CPU operators and AI Core operators to run in parallel in a dynamic shape graph.
In a dynamic shape graph, when this option is enabled, the system automatically identifies AI CPU operators that can be concurrently executed with the AI Core operators in the graph. Operators of different engines are distributed to different flows to implement parallel execution among multiple engines, improving resource utilization and dynamic shape execution performance.
Example: config = NPURunConfig(ac_parallel_enable="1") |
|
compile_dynamic_mode |
Whether to generalize all input shapes in the graph.
Example: config = NPURunConfig(compile_dynamic_mode=True) |
|
all_tensor_not_empty |
Whether to remove control nodes for empty tensor checks in the execution graph. In dynamic shape graph scenarios, control nodes are typically inserted to check whether a node is empty to prevent empty tensor nodes from being sent to the device. If you are certain that the graph does not contain empty tensors, you can enable this option to remove these control nodes and improve graph execution performance.
Example: config = NPURunConfig(all_tensor_not_empty=True) |
Mixed Computing
|
Option |
Description |
|---|---|
|
mix_compile_mode |
Mixed computing enable.
In full offload mode, all compute operators are offloaded to the device. As a supplement to the full offload mode, mixed computing allows certain operators to be executed online within the frontend framework, improving the Ascend AI Processor's adaptability to TensorFlow. Example: config = NPURunConfig(mix_compile_mode=True) |
Debugging
Accuracy Tuning
|
Option |
Description |
|---|---|
|
precision_mode_v2 |
Operator precision mode, which must be of the string type.
In training scenarios:
In online inference scenarios, the default value is fp16. Example: config = NPURunConfig(precision_mode_v2="origin")
NOTE:
|
|
precision_mode |
Operator precision mode, which must be of the string type.
For the For the For the Example: config = NPURunConfig(precision_mode="allow_mix_precision")
NOTE:
|
|
modify_mixlist |
When mixed precision is enabled, you can use this parameter to specify the path and file name of the blocklist, trustlist, and graylist, and specify the operators that allow precision reduction and those that do not allow precision reduction. You can enable the mixed precision by configuring precision_mode_v2 (recommended) or precision_mode in the script.
The blocklist, trustlist, and graylist storage files are in JSON format. A configuration example is as follows:
config = NPURunConfig(modify_mixlist="/home/test/ops_info.json") You can specify the operator types in ops_info.json as shown below. Separate operators with commas (,). {
"black-list": { // Blocklist
"to-remove": [ // Move an operator from the blocklist to the graylist.
"Xlog1py"
],
"to-add": [ // Move an operator from the trustlist or graylist to the blocklist.
"MatMul",
"Cast"
]
},
"white-list": { // Trustlist
"to-remove": [ // Move an operator from the trustlist to the graylist.
"Conv2D"
],
"to-add": [ // Move an operator from the blocklist or graylist to the trustlist.
"Bias"
]
}
}
Note: The operators in the preceding example configuration file are for reference only. The configuration should be based on the actual hardware environment and the built-in tuning policies of the operators. You can query the built-in tuning policy of each operator in mixed precision mode in CANN software installation directory/opp/built-in/op_impl/ai_core/tbe/config/<soc_version>/aic-<soc_version>-ops-info-<opType>.json. Example: "Conv2D":{
"precision_reduce":{
"flag":"true"
},
...
}
|
|
enable_reduce_precision |
Not supported in the current version. |
|
customize_dtypes |
If precision_mode_v2 or precision_mode is used to set the global precision mode of a network, precision problems may occur on particular operators. In this case, you can use customize_dtypes to configure the precision mode of these operators, and still compile other operators using the precision mode specified by precision_mode_v2 or precision_mode. Note if precision_mode_v2 is set to origin or precision_mode is set to must_keep_origin_dtype, customize_dtypes does not take effect. Set it to the path (including the name of the configuration file), for example, /home/test/customize_dtypes.cfg. Example: config = NPURunConfig(customize_dtypes="/home/test/customize_dtypes.cfg") List the names or types of operators whose precision needs customization in the configuration file. Each operator occupies a line, and the operator type must be defined based on Ascend IR. If both operator name and type are configured for an operator, the operator name applies during building. The structure of the configuration file is as follows: # By operator name Opname1::InputDtype:dtype1,dtype2,...OutputDtype:dtype1,... Opname2::InputDtype:dtype1,dtype2,...OutputDtype:dtype1,... # By operator type OpType::TypeName1:InputDtype:dtype1,dtype2,...OutputDtype:dtype1,... OpType::TypeName2:InputDtype:dtype1,dtype2,...OutputDtype:dtype1,... Example: # By operator name resnet_v1_50/block1/unit_3/bottleneck_v1/Relu::InputDtype:float16,int8,OutputDtype:float16,int8 # By operator type OpType::Relu:InputDtype:float16,int8,OutputDtype:float16,int8
NOTE:
|
Accuracy Comparison
|
Option |
Description |
|---|---|
|
dump_config |
Dump configuration. Before creating NPURunConfig, you can instantiate a DumpConfig class for dump configuration. For details about the constructor of the DumpConfig class, see DumpConfig Constructor. Example: config = NPURunConfig(dump_config=dump_config) |
|
quant_dumpable |
If the TensorFlow network is quantized by the AMCT tool, this option can be used to specify whether to collect the dump data before quantization. The default value is 0.
Example: config = NPURunConfig(quant_dumpable="1")
NOTE:
This option applies only to online inference scenarios. When data dump is enabled, you can set this option to 1 to ensure that the dump data before quantization can be collected. |
|
fusion_switch_file |
Directory of the fusion switch configuration file, including the file name. The value can contain letters, digits, underscores (_), hyphens (-), and periods (.). The built-in graph fusion and UB fusion patterns are enabled by default. You can disable selected fusion patterns in the configuration file as needed. For details about fusion patterns that can be disabled, see Graph Fusion and UB Fusion Patterns.
Example:
config = NPURunConfig(fusion_switch_file="/home/test/fusion_switch.cfg") The following is a template of the fusion_switch.cfg configuration file. on indicates that a fusion pattern is enabled, and off indicates that a fusion pattern is disabled. {
"Switch":{
"GraphFusion":{
"RequantFusionPass":"on",
"ConvToFullyConnectionFusionPass":"off",
"SoftmaxFusionPass":"on",
"NotRequantFusionPass":"on",
"ConvConcatFusionPass":"on",
"MatMulBiasAddFusionPass":"on",
"PoolingFusionPass":"on",
"ZConcatv2dFusionPass":"on",
"ZConcatExt2FusionPass":"on",
"TfMergeSubFusionPass":"on"
},
"UBFusion":{
"TbePool2dQuantFusionPass":"on"
}
}
}
To disable all fusion patterns at a time, refer to this configuration file example. {
"Switch":{
"GraphFusion":{
"ALL":"off"
},
"UBFusion":{
"ALL":"off"
}
}
}
Notes:
|
|
buffer_optimize |
Enables buffer optimization. This is an advanced switch.
Example: config = NPURunConfig(buffer_optimize="l2_optimize") |
Performance Tuning
- Basic configuration
Option
Description
iterations_per_loop
Number of iterations per training loop performed on the Ascend AI Processor per sess.run() call. Defaults to 1. The total number of training iterations per loop must be an integer multiple of the value of iterations_per_loop. Training is performed according to the specified number of iterations per loop (iterations_per_loop) on Ascend AI Processor and then the result is returned to the host. This parameter can save unnecessary interactions between the host and device and reduce the training time consumption.
In mixed compute mode (with mix_compile_mode set to True), iterations_per_loop must be set to 1.
Note: When iterations_per_loop is set to a value greater than 1, the total number of training iterations set by the user may be different from the actual total number of iterations due to issues such as loop offload and loss scale overflow.
Example:
config = NPURunConfig(iterations_per_loop=1000)
- Advanced configuration
Option
Description
hcom_parallel
Whether to enable AllReduce gradient update and forward and backward propagation in parallel during distributed training.
- True (default): enabled.
- False: disabled.
For a small network (for example, ResNet-18), you are advised to set this parameter to False.
Example:
config = NPURunConfig(hcom_parallel=True)
op_precision_mode
High-precision or high-performance mode of an operator. You can pass a custom mode configuration file op_precision.ini to set different modes for operators.
You can set this option by operator type (low priority) or node name (high priority). Example:[ByOpType] optype1=high_precision optype2=high_performance optype3=enable_hi_float_32_execution optype4=support_out_of_bound_index [ByNodeName] nodename1=high_precision nodename2=high_performance nodename3=enable_hi_float_32_execution nodename4=support_out_of_bound_index
- high_precision: high precision.
- high_performance: high performance.
- enable_float_32_execution: The FP32 data type is used for internal processing of operators. In this scenario, the FP32 data type is not automatically converted to the HF32 data type. If you are using the HF32 data type for computation and find that the accuracy drop exceeds your expectation, enable this option to specify the use of FP32 for internal computation of certain operators in order to maintain accuracy.
This option is supported only by the following products:
Atlas A3 training products /Atlas A3 inference products Atlas A2 training products /Atlas A2 inference products - enable_hi_float_32_execution: The HF32 data type is used for internal processing of operators. After this option is enabled, the FP32 data type is automatically converted to the HF32 data type. This configuration can reduce the space occupied by data and improve performance. This option is not supported in the current version.
- support_out_of_bound_index: The out-of-bounds verification is performed on the indices of the gather, scatter, and segment operators. The verification deteriorates the operator execution performance.
- keep_fp16: The FP16 data type is used for internal operator processing. In this mode, FP16 is not automatically converted to FP32. If FP32 computation fails to meet performance expectations and high accuracy is not required, you can enable the keep_fp16 mode. This low-precision mode trades accuracy for performance and is not recommended.
- super_performance: ultra-high performance. Compared with high performance, the algorithm calculation formula is optimized.
You can view the supported precision and performance mode values for a specific operator in the opp/built-in/op_impl/ai_core/tbe/impl_mode/all_ops_impl_mode.ini file under the CANN software installation directory.
This parameter is mutually exclusive with op_select_implmode and optypelist_for_implmode. If they are all specified, op_precision_mode takes precedence.
Generally, you do not need to set this parameter. It is used if you need to adjust the precision of a specific operator using the configuration .ini file in the case that you fail to obtain optimal network performance or accuracy in the high-performance or high-precision mode.
Example:
config = NPURunConfig(op_precision_mode="/home/test/op_precision.ini")
enable_scope_fusion_passes
Scope fusion pattern (or scope fusion patterns separated by commas) to take effect during build. Name of the registered fusion pattern. You can pass multiple names. Separate the names by commas (,).
Scope fusion patterns (either built-in or custom) are classified into the following two types:
- General: common scope fusion patterns applicable to all networks. They are enabled by default and cannot be manually invalidated.
- Non-general scope fusion patterns: applicable to specific networks. By default, they are disabled. You can use enable_scope_fusion_passes to enable selected fusion patterns.
Example:
config = NPURunConfig(enable_scope_fusion_passes="ScopeLayerNormPass,ScopeClipBoxesPass")
stream_max_parallel_num
This parameter applies only to NMT networks.
It specifies the degree of parallelism of AI CPU and AI Core engines for parallel execution of AI CPU and AI Core operators.
Example:
config = NPURunConfig(stream_max_parallel_num="DNN_VM_AICPU:10,AIcoreEngine:1")
DNN_VM_AICPU is the name of the AI CPU engine. In this example, the number of concurrent tasks on the AI CPU engine is 10.
AIcoreEngine is the name of the AI Core engine. In this example, the number of concurrent tasks on the AI Core engine is 1.
Defaults to 1. The value cannot exceed the maximum number of AI Cores.
is_tailing_optimization
This parameter applies only to BERT networks.
Enabling communication tailing optimization in distributed training scenarios improves performance. By changing a computation dependency relationship, a computation operation that does not depend on the last AR (gradient aggregation fragment) is scheduled to be performed in parallel with the last AR, to optimize communication tailing. Value:
- True: enabled.
- False (default): disabled.
This parameter must be used in pair with NPUOptimizer Constructor and the value must be the same as that of is_tailing_optimization in NPUOptimizer Constructor.
Example:
config = NPURunConfig(is_tailing_optimization=True)
enable_small_channel
Small channel optimization enable. If it is enabled, performance benefits are yielded at the convolutional layers with channel size <= 4.- 0: disabled. This function is disabled by default in the training scenario (graph_run_mode is 1). You are advised not to enable this function in the training scenario.
- 1 (default): enabled. This option cannot be modified in online inference scenarios (graph_run_mode is 0).
NOTE:After this function is included, performance benefits can be obtained on the ResNet50, ResNet101, and ResNet152 networks. For other network models, the performance may deteriorate.
Example:
config = NPURunConfig(enable_small_channel=0)
variable_placement
If the network weight is large, network execution may fail due to insufficient device memory. In this case, you can deploy the variable to the host to reduce the memory usage of the device.
- Device: The variable is deployed on the device.
- Host: The variable is deployed on the host.
Default value: Device
Constraints:- If this configuration option is set to Host, mixed computing must be enabled (mix_compile_mode = True).
- If the training script contains APIs of TensorFlow V1 control flow operators, such as tf.case, tf.cond, and tf.while_loop, setting variable_placement to Host may cause the network execution to fail. To avoid this problem, add the following APIs to the training script to convert the control flow operators of TensorFlow V1 to V2 and enable resource variables:
tf.enable_control_flow_v2() tf.enable_resource_variables()
Example:
config = NPURunConfig(variable_placement="Device")
graph_max_parallel_model_num
In online inference scenarios, you can set this option to specify the maximum number of threads for parallel graph execution. If the value of this option is greater than 1, the corresponding number of threads are started for parallel graph execution, improving the overall graph pipeline efficiency.
The value must be an integer in the range of [1, INT32_MAX]. The default value is 1. INT32_MAX is the maximum value of the INT32 type, which is 2147483647.
Example:
config = NPURunConfig(graph_max_parallel_model_num=4)
Profiling
|
Option |
Description |
|---|---|
|
profiling_config |
Profiling configuration. Before creating NPURunConfig, you can instantiate a ProfilingConfig class for profiling configuration. For details about the constructor of the ProfilingConfig class, see ProfilingConfig Constructor. Example: config = NPURunConfig(profiling_config=profiling_config) |
AOE
The AOE tuning feature supports only the following products:
Atlas A3 training products /Atlas A3 inference products Atlas A2 training products /Atlas A2 inference products Atlas training products
Operator Building
|
Option |
Description |
|---|---|
|
op_compiler_cache_mode |
Disk cache mode for operator building. enable is the default value.
Notes:
Example:
config = NPURunConfig(op_compiler_cache_mode="enable") |
|
op_compiler_cache_dir |
Disk cache directory for operator compilation. The value can contain letters, digits, underscores (_), hyphens (-), and periods (.). If the specified directory exists and is valid, the kernel_cache subdirectory is automatically created. If the specified directory does not exist but is valid, the system automatically creates a directory and the kernel_cache subdirectory. The storage priority of the operator compilation cache files is as follows: op_compiler_cache_dir > ${ASCEND_CACHE_PATH}/kernel_cache > Default path ($HOME/atc_data) For details about ASCEND_CACHE_PATH, see Environment Variables.
Example:
config = NPURunConfig(op_compiler_cache_dir="/home/test/kernel_cache") |
|
aicore_num |
Maximum number of Cube cores and Vector cores used for operator compilation.
Format: Integer 1|Integer 2, where the two values are separated by vertical bars (|). Integer 1 specifies the maximum number of Cube cores to use, and Integer 2 specifies the maximum number of Vector cores to use. Both values must be greater than 0 and less than or equal to the actual number of Cube cores and Vector cores available on the Ascend AI Processor.
NOTE:
Example: config = NPURunConfig(aicore_num="2|4") |
|
oo_constant_folding |
Enables or disables constant folding.
Constant folding evaluates and replaces constant expressions during graph compilation to reduce memory usage. In most cases, you are advised to retain the default value to enable constant folding. However, some networks require more memory during compilation and running, and the constant memory is occupied throughout the entire lifecycle of a graph. If enabling constant folding increases the overall memory consumption, you can disable it using this parameter.
config = NPURunConfig(oo_constant_folding=True)
NOTE:
If constant folding is disabled and an error occurs during network compilation and running, an error message similar to the following will be displayed:
Solution: Enable constant folding by setting oo_constant_folding to True, and then use the _grappler_do_not_remove attribute via TensorFlow's Grappler to selectively disable constant folding for specific operators. |
Data Augmentation
|
Option |
Description |
|---|---|
|
local_rank_id |
Rank ID of the current process, used in data parallel processing in recommendation networks. The main process deduplicates the data and distributes the deduplicated data to the devices of other processes for forward and backward propagation.
In this mode, multiple devices on a host share one main process for data preprocessing, leaving other processes to receive preprocessed data from the main process. To identify the main process, call the collective communication API get_local_rank_id() to get the rank ID of the current process on its server. Example: config = NPURunConfig(local_rank_id=0, local_device_list="0,1") |
|
local_device_list |
Devices that the main process sends data to, used in conjunction with local_rank_id. config = NPURunConfig(local_rank_id=0, local_device_list="0,1") |
Exception Remedy
|
Option |
Description |
|---|---|
|
hccl_timeout |
Synchronization timeout for inter-device task execution, in seconds. You can set the timeout interval if the default value does not meet your requirement (for example, when a communication failure occurs).
NOTE:
Example: config = NPURunConfig(hccl_timeout=1800) |
|
op_wait_timeout |
Operator wait timeout interval (s). Defaults to 120. Example: config = NPURunConfig(op_wait_timeout=120) |
|
op_execute_timeout |
Operator execution timeout interval (s). Example: config = NPURunConfig(op_execute_timeout=90) |
|
stream_sync_timeout |
Timeout interval for stream synchronization during graph execution. If the timeout interval exceeds the configured value, a synchronization failure is reported. The unit is ms. The default value is -1, indicating that there is no waiting time and no error is reported when the synchronization fails. Note: In cluster scenarios, the value of this option (timeout interval for stream synchronization) must be greater than the collective communication timeout interval, that is, the value of hccl_timeout or the environment variable HCCL_EXEC_TIMEOUT. Example: config = NPURunConfig(stream_sync_timeout=60000) |
|
event_sync_timeout |
Timeout interval for event synchronization during graph execution. If the timeout interval exceeds the configured value, a synchronization failure is reported. The unit is ms. The default value is -1, indicating that there is no waiting time and no error is reported when the synchronization fails. Example: config = NPURunConfig(event_sync_timeout=60000) |
Experiment Parameters
The experiment parameters are extended parameters for debugging and may be changed in later versions. Therefore, they cannot be used in commercial products.
|
Option |
Description |
|---|---|
|
experimental_config |
Extended parameter. Currently, this parameter is not recommended. Before creating NPURunConfig, you can instantiate an ExperimentalConfig class to configure functions. For details about the constructor of the ExperimentalConfig class, see ExperimentalConfig Constructor. |
|
jit_compile |
Determines whether to compile the operator online or use the compiled operator binary file.
NOTICE:
This option is used only for networks of large recommendation models. Example: config = NPURunConfig(jit_compile="auto") |
|
shape_generalization_mode |
When jit_compile is set to true (online operator compilation), use this parameter to configure the shape generalization mode.
NOTICE:
If compile_dynamic_mode is set to True, all input shapes are generalized to -1 in the first iteration. In this case, the configuration of shape_generalization_mode does not take effect. Example: config = NPURunConfig(shape_generalization_mode="FULL") |
|
auto_multistream_parallel_mode |
This option applies only to graphs with a static shape. You can enable parallel execution of Cube and Vector operators to improve graph execution performance.
NOTICE:
Example:
config = NPURunConfig(auto_multistream_parallel_mode="cv") |
Parameters That Will Be Deprecated in Later Versions
The following parameters will be deprecated in later versions. You are advised not to use them anymore.
|
Option |
Description |
|---|---|
|
enable_data_pre_proc |
Performance tuning. Enable for the GetNext operator offload to the Ascend AI Processor. The GetNext operator offload is a prerequisite for iteration offload.
Example: config = NPURunConfig(enable_data_pre_proc=True) |
|
variable_format_optimize |
Performance tuning. Variable format optimization enable.
To improve training efficiency, the format of the variables is converted to a format more compatible with the Ascend AI Processor during variable initialization performed by the network. Enable or disable this function as needed. This parameter is left empty by default, indicating that the configuration is disabled. Example: config = NPURunConfig(variable_format_optimize=True) |
|
op_debug_level |
Operator debug enable. The values are as follows:
This parameter is left empty by default, indicating that the configuration is disabled. Example: config = NPURunConfig(op_debug_level=1) |
|
op_select_implmode |
Operator implementation mode. Certain operators built in the Ascend AI Processor can be implemented in either high-precision or high-performance mode at model build time. Arguments:
This parameter is left empty by default, indicating that the configuration is disabled.
Example:
config = NPURunConfig(op_select_implmode="high_precision") |
|
optypelist_for_implmode |
List of operator types (separated by commas) that use the mode specified by the op_select_implmode parameter. Currently, Pooling, SoftmaxV2, LRN, and ROIAlign operators are supported. Use this parameter in conjunction with op_select_implmode, for example: config = NPURunConfig(
op_select_implmode="high_precision",
optypelist_for_implmode="Pooling,SoftmaxV2")
This parameter is left empty by default, indicating that the configuration is disabled. |
|
dynamic_input |
Whether it is a dynamic input.
Example:
config = NPURunConfig(dynamic_input=True) |
|
dynamic_graph_execute_mode |
Execution mode of a dynamic input. That is, this option takes effect when dynamic_input is set to True. Possible values are: dynamic_execute: dynamic graph compilation. In this mode, the shape range configured in dynamic_inputs_shape_range is used for compilation.
Example:
config = NPURunConfig(dynamic_graph_execute_mode="dynamic_execute") |
|
dynamic_inputs_shape_range |
Shape range of each dynamic input. If a graph has two dataset inputs and one placeholder input, a configuration example is as follows. config = NPURunConfig(dynamic_inputs_shape_range="getnext:[128 ,3~5, 2~128, -1],[64 ,3~5, 2~128, -1];data:[128 ,3~5, 2~128, -1]") Precautions:
|
|
graph_memory_max_size |
Sizes of the network static memory and the maximum dynamic memory (used in earlier versions). In the current version, this parameter does not take effect. The system dynamically allocates memory resources based on the actual memory usage of the network. |
|
variable_memory_max_size |
Size of the variable memory (used in earlier versions). In the current version, this parameter does not take effect. The system dynamically allocates memory resources based on the actual memory usage of the network. |
