aclgrphBuildInitialize Configuration Parameters
|
Parameter |
Description |
|---|---|
|
CORE_TYPE |
Core type used by the network. If the network contains a Cube operator, only AiCore is supported. Arguments:
Configuration example: {ge::ir_option::CORE_TYPE, "AiCore"}
Applicability: |
|
SOC_VERSION |
Ascend AI Processor used during graph build.
To query <soc_version>:
Configuration example: {ge::ir_option::SOC_VERSION, "<soc_version>"}
Applicability: |
|
BUFFER_OPTIMIZE |
Enables or disables buffer optimization. Arguments:
Suggestions: You are advised to enable buffer optimization as this function can improve compute efficiency and performance. However, it is possible that your model contains an operator that is not yet covered by the current implementation. If the inference accuracy degradation is eliminated after the buffer optimization function is disabled, locate the fishy operator and submit it to Huawei technical support, who will add buffer optimization support to your operator as soon as possible. Configuration example: {ge::ir_option::BUFFER_OPTIMIZE, "l2_optimize"}
Note: If this parameter is set to l1_optimize, it cannot be used together with VIRTUAL_TYPE. If they are used together, an error is reported, indicating that L1 fusion is not performed in virtualization scenarios. This prevents scheduling exceptions caused by large operators. Applicability: |
|
ENABLE_COMPRESS_WEIGHT |
Enables global weight compression. Weight compression enable. AI Core supports weight compression. If buffer optimization is enabled, the weight data can be compressed. During operator computation, the weight will be extracted to reduce the bandwidth load and improve the performance. This parameter enables global weight compression. This parameter is mutually exclusive with COMPRESS_WEIGHT_CONF. Arguments:
Configuration example: {ge::ir_option::ENABLE_COMPRESS_WEIGHT, "true"}
Applicability: |
|
COMPRESS_WEIGHT_CONF |
Path and name of the configuration file of the nodes to be compressed. The nodes mainly include the conv and fc operators. This parameter is mutually exclusive with ENABLE_COMPRESS_WEIGHT. Format: The path and file name allow only letters, digits, and underscores (_). The file name can contain letters, digits, underscores (_), and periods (.). Restrictions: The weight compression configuration file is generated by AMCT. It is a list of node names separated with semicolons (;). For example, the content of the compress_weight_nodes.cfg file is: conv1; fc1; conv2_2/x1; fc2; conv5_32/x2;fc6 Configuration example: {ge::ir_option::COMPRESS_WEIGHT_CONF, "$HOME/module/compress_weight_nodes.cfg"}
Applicability: |
|
PRECISION_MODE |
Sets the precision mode of a model. This parameter cannot be used together with PRECISION_MODE_V2 in the same graph. You are advised to use PRECISION_MODE_V2. Arguments:
Default: force_fp16 Configuration example: {ge::ir_option::PRECISION_MODE, "force_fp16"}
Applicability: |
|
PRECISION_MODE_V2 |
Sets the precision mode of a model. This parameter cannot be used together with PRECISION_MODE in the same graph. You are advised to use PRECISION_MODE_V2. Arguments:
Default value: fp16 Configuration example: {ge::ir_option::PRECISION_MODE_V2, "fp16"}
Applicability: |
|
ALLOW_HF32 |
This parameter is reserved and is not supported in the current version. Enables the function of automatically replacing the float32 data type with the HF32 data type. In the current version, this parameter takes effect only for Conv and Matmul operators. HF32 is a single-precision floating-point type of Ascend for internal computation of operators. The following figure shows the comparison of HF32 with other common data types. HF32 shares the same value range with float32, but its mantissa precision (11 bits) is close to FP16 (10 bits). Replacing the original float32 single-precision data type with the HF32 single-precision data type by precision reduction can greatly reduce the space occupied by data and achieve performance improvement. Arguments:
Default: Enable FP32-to-HF32 conversion for Conv operators; disable FP32-to-HF32 conversion for Matmul operators. ALLOW_HF32 automatically replaces float32 with HF32. To make this parameter take effect, ensure that the input or output type of the enabled operator is float32. The default value of PRECISION_MODE_V2 is fp16. If the operator type in the original network model is float32, the operator type is forcibly converted to float16. In this case, ALLOW_HF32 does not take effect. You are advised to change the value of PRECISION_MODE_V2 to origin. The default value of PRECISION_MODE is force_fp16, and you are advised to change the value to must_keep_origin_dtype or force_fp32. Configuration example: {ge::ir_option::ALLOW_HF32, "true"}
Applicability: |
|
TUNE_DEVICE_IDS |
Not supported in the current version. |
|
EXEC_DISABLE_REUSED_MEMORY |
Memory reuse switch. Memory reuse refers to the practice of utilizing memory multiple times based on its lifecycle and size, ensuring that non-conflicting memory is reused to reduce network memory consumption. Arguments:
Configuration example: {ge::ir_option::EXEC_DISABLE_REUSED_MEMORY, "0"}
Applicability: |
|
ENABLE_SINGLE_STREAM |
Switch for limiting a model to use only one stream. Streams preserve the order of a stack of asynchronous operations being executed on the device. Arguments:
Restrictions: If the model contains the Cmo operator and the following control operators, the single-stream feature cannot be used. In this case, use the default value false.
Configuration example: {ge::ir_option::ENABLE_SINGLE_STREAM, "true"}
Applicability: |
|
AICORE_NUM |
Number of AI Cores used for build. Applicability: |
|
FUSION_SWITCH_FILE |
Directory (including the file name) of the fusion switch configuration file. The directory can contain letters, digits, underscores (_), hyphens (-), and periods (.). The built-in graph fusion and UB fusion patterns are enabled by default. You can disable selected fusion patterns in the configuration file. Some fusion patterns are not switchable due to functionality restrictions. For the full list of switchable fusion patterns, see Graph Fusion and UB Fusion Patterns. Configuration example: The following is a template of the fusion_switch.cfg configuration file. on indicates that a fusion pattern is enabled, and off indicates that a fusion pattern is disabled. {
"Switch":{
"GraphFusion":{
"RequantFusionPass":"on",
"ConvToFullyConnectionFusionPass":"off",
"SoftmaxFusionPass":"on",
"NotRequantFusionPass":"on",
"SplitConvConcatFusionPass":"on",
"ConvConcatFusionPass":"on",
"MatMulBiasAddFusionPass":"on",
"PoolingFusionPass":"on",
"ZConcatv2dFusionPass":"on",
"ZConcatExt2FusionPass":"on",
"TfMergeSubFusionPass":"on"
},
"UBFusion":{
"TbePool2dQuantFusionPass":"on"
}
}
}
To disable all fusion patterns at once, refer to this configuration file example. {
"Switch":{
"GraphFusion":{
"ALL":"off"
},
"UBFusion":{
"ALL":"off"
}
}
}
Notes:
Applicability: |
|
ENABLE_SMALL_CHANNEL |
Small channel optimization enable. If this function is enabled, performance benefits are generated at the convolutional layers with channel ≤ 4. You are advised to enable this function in inference scenarios. Arguments:
Configuration example: {ge::ir_option::ENABLE_SMALL_CHANNEL, "1"}
Applicability: |
|
OP_SELECT_IMPL_MODE |
Selects an operator implementation mode. Certain operators built in the Ascend AI Processor can be implemented in either high-precision or high-performance mode at model build time. In high-precision mode, Taylor's theorem or Newton's method is used to improve operator accuracy with float16 input. In high-performance mode, the optimal performance is implemented without affecting the network precision (float16). Arguments:
The preceding implementation modes are distinguished based on the dtype of the operator. Replace ${INSTALL_DIR} with the actual CANN component directory. If the Ascend-CANN-Toolkit package is installed as the root user, the CANN component directory is /usr/local/Ascend/ascend-toolkit/latest. Default: high_performance Configuration example: {ge::ir_option::OP_SELECT_IMPL_MODE, "high_performance"}
Applicability: |
|
OPTYPELIST_FOR_IMPLMODE |
Sets the operator implementation mode in the optype list. Restrictions:
Configuration example: {ge::ir_option::OPTYPELIST_FOR_IMPLMODE, "Pooling,SoftmaxV2"}
Applicability: |
|
OP_COMPILER_CACHE_MODE |
Sets the disk cache mode for operator build. Arguments:
Default: enable Configuration example: {ge::ir_option::OP_COMPILER_CACHE_MODE, "enable"}
Instructions:
Applicability: |
|
OP_COMPILER_CACHE_DIR |
Disk cache directory for operator build. Format: The directory can contain letters, digits, underscores (_), hyphens (-), and periods (.). Defaults to $HOME/atc_data. Configuration example: {ge::ir_option::OP_COMPILER_CACHE_MODE, "enable"}
{ge::ir_option::OP_COMPILER_CACHE_DIR, "/home/test/data/atc_data"}
Restrictions:
Applicability: |
|
DEBUG_DIR |
Directory of the debug-related process files generated during operator build, including the .o (operator binary file), .json (operator description file), and .cce files. Defaults to the current directory. Restrictions:
Configuration example: {ge::ir_option::OP_DEBUG_LEVEL, "1"}
{ge::ir_option::DEBUG_DIR, "/home/test/module/out_debug_info"}
Applicability: |
|
OP_DEBUG_LEVEL |
Operator debug at operator build time. If you want to specify the path for storing the process file of operator build, use DEBUG_DIR. If OP_DEBUG_LEVEL is set to 0, DEBUG_DIR does not take effect. Arguments:
NOTICE:
Configuration example: {ge::ir_option::OP_DEBUG_LEVEL, "1"}
Applicability: |
|
OP_DEBUG_CONFIG |
Enable for global memory check. Arguments: The value is the path of the .cfg configuration file. Multiple options in the configuration file are separated by commas (,).
Configuration example: /root/test0.cfg. The information about the test0.cfg file is as follows: {ge::ir_option::OP_DEBUG_CONFIG, "ccec_g,oom"}
Restrictions: During operator compilation, if you want to compile only some instead of all AI Core operators, you need to add the OP_DEBUG_LIST field to the test0.cfg configuration file. By doing so, only the operators specified in the list are compiled, based on the options configured in OP_DEBUG_CONFIG. The OP_DEBUG_LIST field has the following requirements:
Configuration example: Add the following information to the configuration file (for example, test0.cfg) specified by OP_DEBUG_CONFIG: {ge::ir_option::OP_DEBUG_CONFIG, "ccec_g,oom"}
{ge::ir_option::OP_DEBUG_LIST, "GatherV2,opType::ReduceSum"}
During model compilation, the GatherV2,ReduceSum operator is compiled based on the ccec_g and oom options.
NOTE:
Applicability: |
|
MODIFY_MIXLIST |
When mixed precision is enabled, you can use this parameter to specify the path and file name of the blocklist, trustlist, and graylist, and specify the operators that allow precision reduction and those that do not allow precision reduction. Set this parameter to the path and file name. The file is in JSON format. For the blocklist, trustlist, and graylist, you can view the value of flag in the precision_reduce option in the built-in tuning policy file ${INSTALL_DIR}/opp/built-in/op_impl/ai_core/tbe/config/<soc_version>/aic-<soc_version>-ops-info.json.
Method for enabling mixed precision:
Configuration example:
{ge::ir_option::MODIFY_MIXLIST, "/home/test/ops_info.json"}
You can specify the operator type (or types separated by commas) in ops_info.json as follows. {
"black-list": { // Blocklist
"to-remove": [ // Move an operator from the blocklist to the graylist. Ensure that the specified operator is already on the blocklist.
"Xlog1py"
],
"to-add": [ // Move an operator from the trustlist or graylist to the blocklist.
"Matmul",
"Cast"
]
},
"white-list": { // Trustlist
"to-remove": [ // Move an operator from the trustlist to the graylist. Ensure that the specified operator is already on the trustlist.
"Conv2D"
],
"to-add": [ // Move an operator from the blocklist or graylist to the trustlist.
"Bias"
]
}
}
The operators in the preceding example configuration file are for reference only. The configuration should be based on the actual hardware environment and the built-in tuning strategies of the operators. The following is an example of blocklist, trustlist, and graylist query: "Conv2D":{
"precision_reduce":{
"flag":"true"
},
true: trustlist; false: blocklist; Not configured: graylist. Applicability: |
|
SPARSITY |
Global sparsity enable. In the model output by AMCT after 2:4 structured sparsity, there may be the cases that at least two weight elements in the Cin dimension out of four contiguous ones are forced to zero. You can enable global sparsity during model conversion to filter out two elements to reduce computational demand for inference and optimize inference performance. Due to hardware restrictions, this parameter cannot be used together with ENABLE_COMPRESS_WEIGHT or COMPRESS_WEIGHT_CONF. Arguments:
Configuration example: {ge::ir_option::SPARSITY, "1"}
Restrictions: When using this parameter, ensure that a sparse model is used. You are advised to use the compression combination function of AMCT (TensorFlow) or AMCT (PyTorch). The compression combination requires 2:4 structured sparsity and quantization aware training. Applicability: |
|
EXTERNAL_WEIGHT |
Externalizes the weights of the Const/Constant nodes on the network and converts the weights to FileConstant when the OM model file is generated. In the offline scenario, if the model weight is large and the environment has restrictions on the .om file size, you are advised to enable the external weight to save the weight separately to reduce the .om file size. Arguments:
Configuration example: {ge::ir_option::EXTERNAL_WEIGHT, "1"}
Restrictions:
Applicability: |
|
DETERMINISTIC |
Enables or disables deterministic computing. By default, deterministic computing is disabled. The results of multiple executions of an operator with the same hardware and input may be different. This is generally caused by asynchronous multi-thread executions during operator implementation, which changes the accumulation sequence of floating point numbers. When deterministic computing is enabled, the same output is generated if an operator is executed for multiple times with the same hardware and input. You are advised not to enable deterministic computing because it slows down operator execution and affects performance. If the execution results of a model are different for multiple times or the precision needs to be optimized, you can enable deterministic computing to assist model debugging and optimization. Arguments:
Configuration example: {ge::ir_option::DETERMINISTIC, "1"}
Applicability: |
|
OPTION_HOST_ENV_OS |
If the OS and architecture of the model build environment are inconsistent with those of the model operating environment, set this parameter to the OS type of the model operating environment. If this parameter is not set, the OS type of the model build environment is used by default. This parameter is used together with OPTION_HOST_ENV_CPU. You can use OPTION_HOST_ENV_OS to set the OS type and use OPTION_HOST_ENV_CPU to set the OS architecture. Argument: linux Configuration example: {ge::ir_option::OPTION_HOST_ENV_OS, "linux"}
{ge::ir_option::OPTION_HOST_ENV_CPU, "x86_64"}
Applicability: |
|
OPTION_HOST_ENV_CPU |
If the OS and its architecture of the model build environment are inconsistent with those of the model operating environment, set this parameter to the OS architecture of the model operating environment. If this parameter is not set, the OS architecture of the model build environment is used by default. It is used together with OPTION_HOST_ENV_OS. Arguments:
Configuration example: {ge::ir_option::OPTION_HOST_ENV_OS, "linux"}
{ge::ir_option::OPTION_HOST_ENV_CPU, "x86_64"}
Applicability: |
|
VIRTUAL_TYPE |
Specifies whether an offline model can run on a virtual device generated by the Ascend virtual instance feature. If the computing power of a chip is too much for cloud users or small enterprises, the Ascend virtual instance feature can be applied to allocate a proper amount of computing power as needed by the users or small enterprises to suit their services. A virtual device is a virtual acceleration resource allocated by a chip based on specified computing power. Arguments:
Configuration example: {ge::ir_option::VIRTUAL_TYPE, "1"}
Restrictions:
Applicability: |
|
COMPRESSION_OPTIMIZE_CONF |
Configures the path including the name of the compression optimization configuration file. This parameter is used to enable the compression optimization function specified in the configuration file to improve network performance. For example, /home/test/compression_optimize.cfg. An example of the file contents is as follows. enable_first_layer_quantization:true
Applicability: |
|
CLUSTER_CONFIG |
Applicable to the distributed compilation and partition of foundation models. Specifies the configuration file and path of the logical topology in the target deployment environment. After being parsed, the file is used for offline build of the HCCL operator in a graph. If the graph contains communication operators or algorithm-based sharding is enabled, you need to configure this parameter.
Configuration example:
{ge::ir_option::CLUSTER_CONFIG, "/home/test/cluster_config.json"}
The configuration file must be in JSON format. The following is an example. For details about the parameters, see Parameters in the CLUSTER_CONFIG File.
Applicability: |
|
OPTION_SCREEN_PRINT_MODE |
Determines whether to display the graph build process. Arguments:
Configuration example: {ge::ir_option::OPTION_SCREEN_PRINT_MODE, "disable"}
Applicability: |
|
AC_PARALLEL_ENABLE |
Whether to allow AI CPU operators and AI Core operators to run in parallel in a dynamic-shape graph. In a dynamic-shape graph, when this function is enabled, the system automatically identifies AI CPU operators that can be run in parallel with the AI Core operators in the graph. Operators of different engines are distributed to different streams to run in parallel, improving resource utilization and dynamic shape execution performance. Arguments:
Configuration example: {ge::ir_option::AC_PARALLEL_ENABLE, "1"}
Applicability: |
|
TILING_SCHEDULE_OPTIMIZE |
Tiling offload scheduling optimization. As internal storage of the AI Core in the NPU cannot store all the input and output data of operators, the input data is tiled into different parts. The first part is transferred in, computed, and then transferred out, so does the next part. This process is called tiling. Then, a computation program, called tiling implementation, determines tiling parameters (such as the block size transferred each time and the total number of cycles) based on operator information such as shape. The AI Core is not good at scalar computation in the tiling implementation. Therefore, tiling implementation is generally executed on the CPU on the host. However, tiling implementation is executed on the device when the following conditions are met:
Arguments:
Configuration example: {ge::ir_option::TILING_SCHEDULE_OPTIMIZE, "1"}
Applicability: |
|
OPTION_EXPORT_COMPILE_STAT |
Whether to generate the result file fusion_result.json of operator fusion information (including graph fusion and UB fusion) during graph build. This file is used to record the fusion patterns used during graph build. In the file:
Arguments:
NOTE:
Configuration example: {ge::ir_option::OPTION_EXPORT_COMPILE_STAT, "1"}
Applicability: |