aclgrphBuildInitialize Configuration Parameters

**Table 1** aclgrphBuildInitialize configuration parameters
Parameter	Description
CORE_TYPE	Core type used by the network. If the network contains a Cube operator, only AiCore is supported. Arguments: VectorCore AiCore (default) Configuration example: {ge::ir_option::CORE_TYPE, "AiCore"} Applicability: Atlas 200/300/500 Inference Product : not supported Atlas Training Series Product : not supported
SOC_VERSION	Ascend AI Processor used during graph build. This parameter is optional if the Ascend AI Processor exists in the current environment. This parameter is mandatory if the current environment does not have an Ascend AI Processor, that is, the development environment. To query <soc_version>: Run the npu-smi info command on the server where the Ascend AI Processor is installed to obtain the Chip Name information. The actual value is AscendChip Name. For example, if Chip Name is xxxyy, the actual value is Ascendxxxyy. Configuration example: {ge::ir_option::SOC_VERSION, "<soc_version>"} Applicability: Atlas 200/300/500 Inference Product : supported Atlas Training Series Product : supported
BUFFER_OPTIMIZE	Enables or disables buffer optimization. Arguments: l1_optimize: Enables L1 optimization. Invalid in the current version. Equivalent to off_optimize. l2_optimize: Enables L2 optimization. The default value is l2_optimize. off_optimize: Disables buffer optimization. Suggestions: You are advised to enable buffer optimization as this function can improve compute efficiency and performance. However, it is possible that your model contains an operator that is not yet covered by the current implementation. If the inference accuracy degradation is eliminated after the buffer optimization function is disabled, locate the fishy operator and submit it to Huawei technical support, who will add buffer optimization support to your operator as soon as possible. Configuration example: {ge::ir_option::BUFFER_OPTIMIZE, "l2_optimize"} Note: If this parameter is set to l1_optimize, it cannot be used together with VIRTUAL_TYPE. If they are used together, an error is reported, indicating that L1 fusion is not performed in virtualization scenarios. This prevents scheduling exceptions caused by large operators. Applicability: Atlas 200/300/500 Inference Product : supported Atlas Training Series Product : supported
ENABLE_COMPRESS_WEIGHT	Enables global weight compression. Weight compression enable. AI Core supports weight compression. If buffer optimization is enabled, the weight data can be compressed. During operator computation, the weight will be extracted to reduce the bandwidth load and improve the performance. This parameter enables global weight compression. This parameter is mutually exclusive with COMPRESS_WEIGHT_CONF. Arguments: true: enabled. false (default): disabled. Configuration example: {ge::ir_option::ENABLE_COMPRESS_WEIGHT, "true"} Applicability: Atlas 200/300/500 Inference Product : not supported Atlas Training Series Product : not supported
COMPRESS_WEIGHT_CONF	Path and name of the configuration file of the nodes to be compressed. The nodes mainly include the conv and fc operators. This parameter is mutually exclusive with ENABLE_COMPRESS_WEIGHT. Format: The path and file name allow only letters, digits, and underscores (_). The file name can contain letters, digits, underscores (_), and periods (.). Restrictions: The weight compression configuration file is generated by AMCT. It is a list of node names separated with semicolons (;). For example, the content of the compress_weight_nodes.cfg file is: conv1; fc1; conv2_2/x1; fc2; conv5_32/x2;fc6 Configuration example: {ge::ir_option::COMPRESS_WEIGHT_CONF, "$HOME/module/compress_weight_nodes.cfg"} Applicability: Atlas 200/300/500 Inference Product : not supported Atlas Training Series Product : not supported
PRECISION_MODE	Sets the precision mode of a model. This parameter cannot be used together with PRECISION_MODE_V2 in the same graph. You are advised to use PRECISION_MODE_V2. Arguments: force_fp32/cube_fp16in_fp32out: force_fp32 has the same effect as that of cube_fp16in_fp32out. The system selects a processing mode based on cube or vector operators. cube_fp16in_fp32out is newly added to the new version. For cube operators, this option has clearer semantics. For cube operators, the system processes the computation based on the operator implementation. The preferred input data type is float16 and the output data type is float32. If the float16 input data and float32 output data types are not supported, set both the input and output data types to float32. If the float32 input and output data types are not supported, set both the input and output data types to float16. If the float16 input and output data types are not supported, an error is reported. For vector operators, float32 is forcibly selected for operators supporting both float16 and float32, even if the original precision is float16. This argument is invalid if your model contains operators not supporting float32, for example, an operator that supports only float16. In this case, float16 is retained. If the operator does not support float32 and it is configured to the blocklist of precision reduction (by setting precision_reduce to false), the counterpart AI CPU operator supporting float32 is used. If the AI CPU operator is not supported, an error is reported. force_fp16: Forces float16 for operators supporting both float16 and float32. allow_fp32_to_fp16: For cube operators, float16 is used. For vector operators, preserve the original precision for operators supporting float32; else, forces float16. must_keep_origin_dtype: Retain the original precision. If the precision of an operator in the original graph is float16, and the implementation of the operator in the AI Core does not support float16 but supports only float32 and bfloat16, the system automatically uses high-precision float32. If the precision of an operator in the original graph is float16, and the implementation of the operator in the AI Core does not support float16 but supports only bfloat16, the AI CPU operator of float16 is used. If the AI CPU operator is not supported, an error is reported. If the precision of an operator in the original graph is float32, and the implementation of the operator in the AI Core does not support float32 but supports only float16, the AI CPU operator of float32 is used. If the AI CPU operator is not supported, an error is reported. allow_mix_precision/allow_mix_precision_fp16: allow_mix_precision has the same effect as that of allow_mix_precision_fp16, indicating that mixed precision of float16 and float32 is used for neural network processing. allow_mix_precision_fp16 is newly added to the new version, which has clearer semantics for easy understanding. In this mode, float16 is automatically used for certain float32 operators based on the built-in tuning policies. This will improve system performance and reduce memory footprint with minimal accuracy degradation. If this mode is used, you can view the value of the precision_reduce option in the built-in tuning policy file ${INSTALL_DIR}/opp/built-in/op_impl/ai_core/tbe/config/<soc_version>/aic-<soc_version>-ops-info.json in the OPP installation directory. If it is set to true, the operator is on the mixed precision trustlist and its precision will be reduced from float32 to float16. If it is set to false, the operator is on the mixed precision blocklist and its precision will not be reduced from float32 to float16. If an operator does not have the precision_reduce option configured, the operator is on the graylist and will follow the same precision processing as the upstream operator. Default: force_fp16 Configuration example: {ge::ir_option::PRECISION_MODE, "force_fp16"} Applicability: Atlas 200/300/500 Inference Product : supported Atlas Training Series Product : supported
PRECISION_MODE_V2	Sets the precision mode of a model. This parameter cannot be used together with PRECISION_MODE in the same graph. You are advised to use PRECISION_MODE_V2. Arguments: fp16: Forces float16 for operators supporting both float16 and float32. origin: Retain the original precision. If the precision of an operator in the original graph is float16, and the implementation of the operator in the AI Core does not support float16 but supports only float32 and bfloat16, the system automatically uses high-precision float32. If the precision of an operator in the original graph is float16, and the implementation of the operator in the AI Core does not support float16 but supports only bfloat16, the AI CPU operator of float16 is used. If the AI CPU operator is not supported, an error is reported. If the precision of an operator in the original graph is float32, and the implementation of the operator in the AI Core does not support float32 but supports only float16, the AI CPU operator of float32 is used. If the AI CPU operator is not supported, an error is reported. cube_fp16in_fp32out: The system selects a processing mode based on the operator type for operators supporting both float16 and float32. For cube operators, the system processes the computation based on the operator implementation. The preferred input data type is float16 and the output data type is float32. If the float16 input data and float32 output data types are not supported, set both the input and output data types to float32. If the float32 input and output data types are not supported, set both the input and output data types to float16. If the float16 input and output data types are not supported, an error is reported. For vector operators, float32 is forcibly selected for operators supporting both float16 and float32, even if the original precision is float16. This argument is invalid if your model contains operators not supporting float32, for example, an operator that supports only float16. In this case, float16 is retained. If the operator does not support float32 and it is configured to the blocklist of precision reduction (by setting precision_reduce to false), the counterpart AI CPU operator supporting float32 is used. If the AI CPU operator is not supported, an error is reported. mixed_float16: Mixed precision of float16 and float32 is used for neural network processing. Computations are done in float16 for float32 operators according to the built-in tuning policies. This will improve system performance and reduce memory footprint with minimal accuracy degradation. If this mode is used, you can view the value of the precision_reduce option in the built-in tuning policy file ${INSTALL_DIR}/opp/built-in/op_impl/ai_core/tbe/config/<soc_version>/aic-<soc_version>-ops-info.json in the OPP installation directory. If it is set to true, the operator is on the mixed precision trustlist and its precision will be reduced from float32 to float16. If it is set to false, the operator is on the mixed precision blocklist and its precision will not be reduced from float32 to float16. If an operator does not have the precision_reduce option configured, the operator is on the graylist and will follow the same precision processing as the upstream operator. mixed_hif8: enables automatic mixed precision, indicating that hifloat8 (for details about this data type, see Link), float16, and float32 are used together to process the neural network. In this mode, hifloat8 is automatically used for certain float16 and float32 operators based on the built-in tuning policies. This will improve system performance and reduce memory footprint with minimal precision degradation. The current version does not support this option. If this mode is used, you can view the value of precision_reduce in the built-in tuning policy file ${INSTALL_DIR}/opp/built-in/op_impl/ai_core/tbe/config/<soc_version>/aic-<soc_version>-ops-info.json. true: The operator is on the mixed precision trustlist and its precision will be reduced from float16/float32 to hifloat8. false: The operator is on the mixed precision blocklist and its precision will not be reduced from float16/float32 to hifloat8. If an operator does not have the precision_reduce option configured, the operator is on the graylist and will follow the same precision processing as the upstream operator. cube_hif8: The hifloat8 data type is forcibly used if the Cube operator in the network model supports both hifloat8 and float16/float32. The current version does not support this option. Default value: fp16 Configuration example: {ge::ir_option::PRECISION_MODE_V2, "fp16"} Applicability: Atlas 200/300/500 Inference Product : supported Atlas Training Series Product : supported
ALLOW_HF32	This parameter is reserved and is not supported in the current version. Enables the function of automatically replacing the float32 data type with the HF32 data type. In the current version, this parameter takes effect only for Conv and Matmul operators. HF32 is a single-precision floating-point type of Ascend for internal computation of operators. The following figure shows the comparison of HF32 with other common data types. HF32 shares the same value range with float32, but its mantissa precision (11 bits) is close to FP16 (10 bits). Replacing the original float32 single-precision data type with the HF32 single-precision data type by precision reduction can greatly reduce the space occupied by data and achieve performance improvement. Arguments: true: Enable the function of automatically converting the FP32 data type to the HF32 data type for Conv and Matmul operators. For details about the operators for which this function is enabled, see opp/built-in/op_impl/ai_core/tbe/impl_mode/allow_hf32_matmul_t_conv_t.ini in the file storage path after the CANN software is installed. This file cannot be modified by users. false: Disable the function of automatically converting the FP32 data type to the HF32 data type for Conv and Matmul operators. For details about the operators for which this function is disabled, see opp/built-in/op_impl/ai_core/tbe/impl_mode/allow_hf32_matmul_f_conv_f.ini in the file storage path after the CANN software is installed. This file cannot be modified by users. Default: Enable FP32-to-HF32 conversion for Conv operators; disable FP32-to-HF32 conversion for Matmul operators. ALLOW_HF32 automatically replaces float32 with HF32. To make this parameter take effect, ensure that the input or output type of the enabled operator is float32. The default value of PRECISION_MODE_V2 is fp16. If the operator type in the original network model is float32, the operator type is forcibly converted to float16. In this case, ALLOW_HF32 does not take effect. You are advised to change the value of PRECISION_MODE_V2 to origin. The default value of PRECISION_MODE is force_fp16, and you are advised to change the value to must_keep_origin_dtype or force_fp32. Configuration example: {ge::ir_option::ALLOW_HF32, "true"} Applicability: Atlas 200/300/500 Inference Product : not supported Atlas Training Series Product : not supported
TUNE_DEVICE_IDS	Not supported in the current version.
EXEC_DISABLE_REUSED_MEMORY	Memory reuse switch. Memory reuse refers to the practice of utilizing memory multiple times based on its lifecycle and size, ensuring that non-conflicting memory is reused to reduce network memory consumption. Arguments: 1: disabled. Note that if your model is large and the memory reuse function is disabled, model build might fail due to insufficient memory. 0 (default): enabled Configuration example: {ge::ir_option::EXEC_DISABLE_REUSED_MEMORY, "0"} Applicability: Atlas 200/300/500 Inference Product : supported Atlas Training Series Product : supported
ENABLE_SINGLE_STREAM	Switch for limiting a model to use only one stream. Streams preserve the order of a stack of asynchronous operations being executed on the device. Arguments: true: Each model can use only one stream at inference time. false: Each model can use more than one stream at inference time. The default value is false. Restrictions: If the model contains the Cmo operator and the following control operators, the single-stream feature cannot be used. In this case, use the default value false. Merge Switch Enter RefEnter Configuration example: {ge::ir_option::ENABLE_SINGLE_STREAM, "true"} Applicability: Atlas Training Series Product : supported Atlas 200/300/500 Inference Product : not supported
AICORE_NUM	Number of AI Cores used for build. Applicability: Atlas 200/300/500 Inference Product : not supported Atlas Training Series Product : not supported
FUSION_SWITCH_FILE	Directory (including the file name) of the fusion switch configuration file. The directory can contain letters, digits, underscores (_), hyphens (-), and periods (.). The built-in graph fusion and UB fusion patterns are enabled by default. You can disable selected fusion patterns in the configuration file. Some fusion patterns are not switchable due to functionality restrictions. For the full list of switchable fusion patterns, see Graph Fusion and UB Fusion Patterns. Configuration example: The following is a template of the fusion_switch.cfg configuration file. on indicates that a fusion pattern is enabled, and off indicates that a fusion pattern is disabled. { "Switch":{ "GraphFusion":{ "RequantFusionPass":"on", "ConvToFullyConnectionFusionPass":"off", "SoftmaxFusionPass":"on", "NotRequantFusionPass":"on", "SplitConvConcatFusionPass":"on", "ConvConcatFusionPass":"on", "MatMulBiasAddFusionPass":"on", "PoolingFusionPass":"on", "ZConcatv2dFusionPass":"on", "ZConcatExt2FusionPass":"on", "TfMergeSubFusionPass":"on" }, "UBFusion":{ "TbePool2dQuantFusionPass":"on" } } } To disable all fusion patterns at once, refer to this configuration file example. { "Switch":{ "GraphFusion":{ "ALL":"off" }, "UBFusion":{ "ALL":"off" } } } Notes: Some built-in fusion patterns are not switchable due to functionality restrictions and these fusion patterns will remain enabled despite user's switch settings. To disable all fusion patterns except selected ones, refer to the following example. { "Switch":{ "GraphFusion":{ "ALL":"off", "SoftmaxFusionPass":"on" }, "UBFusion":{ "ALL":"off", "TbePool2dQuantFusionPass":"on" } } } Applicability: Atlas 200/300/500 Inference Product : supported Atlas Training Series Product : supported
ENABLE_SMALL_CHANNEL	Small channel optimization enable. If this function is enabled, performance benefits are generated at the convolutional layers with channel ≤ 4. You are advised to enable this function in inference scenarios. Arguments: 0 (default): disabled 1: enabled Configuration example: {ge::ir_option::ENABLE_SMALL_CHANNEL, "1"} Applicability: Atlas 200/300/500 Inference Product : supported Atlas Training Series Product : supported
OP_SELECT_IMPL_MODE	Selects an operator implementation mode. Certain operators built in the Ascend AI Processor can be implemented in either high-precision or high-performance mode at model build time. In high-precision mode, Taylor's theorem or Newton's method is used to improve operator accuracy with float16 input. In high-performance mode, the optimal performance is implemented without affecting the network precision (float16). Arguments: high_precision: High-precision implementation mode. This option sets the operator implementation mode by using the built-in configuration file, which is stored in ${INSTALL_DIR}/opp/op_impl/built-in/ai_core/tbe/impl_mode/high_precision.ini. To ensure compatibility, this argument takes effect only for the operator list in the high_precision.ini file. This list can be used to control the effective scope of operators and ensure that the network models of earlier versions are not affected. high_performance (default): High-performance implementation mode. This option sets the operator implementation mode by using the built-in configuration file, which is stored in ${INSTALL_DIR}/opp/built-in/op_impl/ai_core/tbe/impl_mode/high_performance.ini. To ensure compatibility, this argument takes effect only for the operator list in the high_performance.ini file. This list can be used to control the effective scope of operators and ensure that the network models of earlier versions are not affected. high_precision_for_all: High-precision mode. This option sets the operator implementation mode by using the built-in configuration file, which is stored in ${INSTALL_DIR}/opp/built-in/op_impl/ai_core/tbe/impl_mode/high_precision_for_all.ini. The list in this file may be updated with the version. This implementation mode may cause incompatibility. If an operator in the new software package sets the implementation mode (that is, an implementation mode is added for a certain operator in the configuration file), the performance of the earlier network model that uses the high_precision_for_all mode may deteriorate. high_performance_for_all: High-performance mode. This option sets the operator implementation mode by using the built-in configuration file, which is stored in ${INSTALL_DIR}/opp/built-in/op_impl/ai_core/tbe/impl_mode/high_performance_for_all.ini. The list in this file may be updated with the version. This implementation mode may cause incompatibility. If an operator in the new software package sets the implementation mode (that is, an implementation mode is added for a certain operator in the configuration file), the precision of the earlier network model that uses the high_performance_for_all mode may deteriorate. The preceding implementation modes are distinguished based on the dtype of the operator. Replace ${INSTALL_DIR} with the actual CANN component directory. If the Ascend-CANN-Toolkit package is installed as the root user, the CANN component directory is /usr/local/Ascend/ascend-toolkit/latest. Default: high_performance Configuration example: {ge::ir_option::OP_SELECT_IMPL_MODE, "high_performance"} Applicability: Atlas 200/300/500 Inference Product : supported Atlas Training Series Product : supported
OPTYPELIST_FOR_IMPLMODE	Sets the operator implementation mode in the optype list. Restrictions: The operators on the list use the implementation mode specified by OP_SELECT_IMPL_MODE, which is either high_precision or high_performance. Use commas (,) to separate operators. This parameter must be used together with OP_SELECT_IMPL_MODE and takes effect only for specified operators. For other operators, the default implementation mode is used. For example, OP_SELECT_IMPL_MODE is set to high_precision, and OPTYPELIST_FOR_IMPLMODE is set to Pooling or SoftmaxV2. The preceding configuration example indicates that the high-precision mode is used only for the Pooling and SoftmaxV2 operators. For operators whose precision modes are not specified, the default implementation mode is used. Configuration example: {ge::ir_option::OPTYPELIST_FOR_IMPLMODE, "Pooling,SoftmaxV2"} Applicability: Atlas 200/300/500 Inference Product : supported Atlas Training Series Product : supported
OP_COMPILER_CACHE_MODE	Sets the disk cache mode for operator build. Arguments: enable: enabled. If it is enabled, operators with the same build configurations and operator configurations will not be built repeatedly, thus accelerating the build speed. force: Enabled with cache forcibly refreshed. That is, the existing cache is cleared up before the operator is recompiled and added to the cache. For example, for Python changes, dependency library changes, or repository changes after operator optimization, you need to set this option to force to clear up the existing cache and then change it to enable to prevent the cache from being forcibly refreshed during each build. disable (default): disabled. Default: enable Configuration example: {ge::ir_option::OP_COMPILER_CACHE_MODE, "enable"} Instructions: To specify the disk cache path for operator build, use this parameter together with OP_COMPILER_CACHE_DIR. When you enable the operator build cache function, you can set the disk space of the cache folder with the configuration file (the op_cache.ini file automatically generated in the path specified by OP_COMPILER_CACHE_DIR after operator build) or environment variables. Using the op_cache.ini configuration file: If the op_cache.ini file does not exist, manually create it. Open the file and add the following information: # Configure the file format (required). The automatically generated file contains the following information by default. When manually creating a file, enter the following information: [op_compiler_cache] # Limit the drive space of the cache folder on a chip. The value must be an integer, in MB. max_op_cache_size=500 # Set the ratio of the cache size to be reserved. The value range is [1,100], in percentage. For example, 80 indicates that when the cache space is insufficient, 80% of the cache space is reserved and the rest is cleared up. remain_cache_size_ratio=80 The op_cache.ini file takes effect only when the values of max_op_cache_size and remain_cache_size_ratio in the preceding file are valid. If the size of the build cache file exceeds the value of max_op_cache_size and the cache file is not accessed for more than half an hour, the cache file will be aged. (Operator build will not be interrupted due to the size of the build cache file exceeding the set limit. Therefore, if max_op_cache_size is set to a small value, the size of the actual build cache file may exceed the configured value.) To disable the build cache aging function, set max_op_cache_size to -1. In this case, the access time is not updated when the operator cache is accessed, the operator build cache is not aged, and the default drive space is 500 MB. If multiple users use the same cache path, you are advised to use the configuration file to set the cache path. In this scenario, the op_cache.ini file affects all users. Using environment variables In this scenario, the environment variable ASCEND_MAX_OP_CACHE_SIZE is used to limit the storage space of the cache folder of a chip. When the build cache space reaches the specified value and the cache file is not accessed for more than half an hour, the cache file is aged. The environment variable ASCEND_REMAIN_CACHE_SIZE_RATIO is used to set the ratio of the cache space to be reserved. A configuration example is provided as follows: # The ASCEND_MAX_OP_CACHE_SIZE environment variable defaults to 500, in MB. The value must be an integer. export ASCEND_MAX_OP_CACHE_SIZE=500 # ASCEND_REMAIN_CACHE_SIZE_RATIO environment variable value range is [1,100]. The default value is 50, in percentage. For example, 80 indicates that 80% of the cache space is reserved when the cache space is insufficient. export ASCEND_REMAIN_CACHE_SIZE_RATIO=50 The argument configured through environment variables takes effect only for the current user. To disable the build cache aging function, set the environment variable ASCEND_MAX_OP_CACHE_SIZE to -1. In this case, the access time is not updated when the operator cache is accessed, the operator build cache is not aged, and the default drive space is 500 MB. Caution: If both the op_cache.ini file and environment variable are configured, the configuration items in the op_cache.ini file are read first. If neither the op_cache.ini file nor the environment variable are configured, the system default values are read: 500 MB disk space and 50% reserved cache space. If this parameter is set to force, the existing cache will be cleared. Therefore, it is not recommended for parallel program compilation. Otherwise, the cache used by other models may be cleared, causing compilation failures. disable or force is recommended for publishing the final model. If the repository changes after operator tuning, set this parameter to force to refresh the cache. Otherwise, the new tuning repository cannot be applied, and the tuning application fails to be executed. When the debugging function is enabled: If OP_DEBUG_LEVEL is set to a non-zero value, the OP_COMPILER_CACHE_MODE parameter configuration does not take effect, the operator build cache function is disabled, and all operators are recompiled. If OP_DEBUG_CONFIG is not empty and OP_DEBUG_LIST is not configured, the OP_COMPILER_CACHE_MODE parameter configuration does not take effect, the operator build cache function is disabled, and all operators are recompiled. If OP_DEBUG_CONFIG is not empty and OP_DEBUG_LIST is configured in the configuration file: For operators in the list, ignore the configuration of OP_COMPILER_CACHE_MODE and continue to recompile them. For operators out of the list, if OP_COMPILER_CACHE_MODE is set to enable or force, the cache function is enabled. If OP_COMPILER_CACHE_MODE is set to disable, the cache function is disabled and the operators are recompiled. Applicability: Atlas 200/300/500 Inference Product : supported Atlas Training Series Product : supported
OP_COMPILER_CACHE_DIR	Disk cache directory for operator build. Format: The directory can contain letters, digits, underscores (_), hyphens (-), and periods (.). Defaults to $HOME/atc_data. Configuration example: {ge::ir_option::OP_COMPILER_CACHE_MODE, "enable"} {ge::ir_option::OP_COMPILER_CACHE_DIR, "/home/test/data/atc_data"} Restrictions: To specify the disk cache path for operator build, use this option together with OP_COMPILER_CACHE_MODE. If the specified directory exists and is valid, a kernel_cache subdirectory is automatically created. If the specified directory does not exist but is valid, the system automatically creates this directory and the kernel_cache subdirectory. Do not store other self-owned content in the default cache directory. The self-owned content will be deleted together with the default cache directory during software package installation or upgrade. The non-default cache directory specified by this parameter cannot be deleted. The directory will not be deleted during software package installation or upgrade. In addition to OP_COMPILER_CACHE_DIR, the environment variable ASCEND_CACHE_PATH can be used to set the disk cache directory for operator build. The priorities of the configuration methods are as follows: OP_COMPILER_CACHE_DIR > ASCEND_CACHE_PATH > default storage path. Applicability: Atlas 200/300/500 Inference Product : supported Atlas Training Series Product : supported
DEBUG_DIR	Directory of the debug-related process files generated during operator build, including the .o (operator binary file), .json (operator description file), and .cce files. Defaults to the current directory. Restrictions: If you want to specify the path for storing the process file of operator build, use DEBUG_DIR and OP_DEBUG_LEVEL together. If OP_DEBUG_LEVEL is set to 0, DEBUG_DIR cannot be used. In addition to DEBUG_DIR, the environment variable ASCEND_WORK_PATH can be used to set the path for storing the debugging file generated by operator build. The priorities of the configuration methods are as follows: DEBUG_DIR > ASCEND_WORK_PATH > default storage path. Configuration example: {ge::ir_option::OP_DEBUG_LEVEL, "1"} {ge::ir_option::DEBUG_DIR, "/home/test/module/out_debug_info"} Applicability: Atlas 200/300/500 Inference Product : supported Atlas Training Series Product : supported
OP_DEBUG_LEVEL	Operator debug at operator build time. If you want to specify the path for storing the process file of operator build, use DEBUG_DIR. If OP_DEBUG_LEVEL is set to 0, DEBUG_DIR does not take effect. Arguments: 0 (default): Disables operator debug. The operator build folder kernel_meta is not generated in the current execution path. 1: Enables operator debug. The kernel_meta folder is generated in the current execution path, and the .o file (operator binary file), .json file (operator description file), and TBE instruction mapping files (operator file .cce and python-CCE mapping file _loc.json) are generated in the folder for later analysis of AI Core errors. 2: Enables operator debug. The kernel_meta folder is generated in the current execution path, and the .o file (operator binary file), .json file (operator description file), and TBE instruction mapping files (operator file .cce and python-CCE mapping file _loc.json) are generated in the folder for later analysis of AI Core errors. Setting this option to 2 also disables build optimization and enables the CCE compiler debug function (the CCE compiler option is set to -O0-g). 3: Disables operator debug. The kernel_meta folder is generated in the current execution path, and the .o file (operator binary file) and .json file (operator description file) are generated in the folder. You can refer to these files when analyzing operator errors. 4: Disables operator debug. The kernel_meta folder is generated in the current execution path, and the .o file (operator binary file), .json file (operator description file), TBE instruction mapping file (operator file .cce), and UB fusion description file ({$kernel_name}_compute.json) are generated* in the folder. These files can be used for problem reproduction and precision comparison during operator error analysis. NOTICE: If OP_DEBUG_LEVEL is set to 0 and OP_DEBUG_CONFIG is also set, the operator build directory kernel_meta is retained in the current execution path. If OP_DEBUG_LEVEL is set to 0 and the NPU_COLLECT_PATH environment variable is set, the build directory kernel_meta is always retained. If the ASCEND_WORK_PATH environment variable is set, the build directory is retained in the path specified by the environment variable. If the ASCEND_WORK_PATH environment variable does not exist, the build directory is retained in the current execution path. You are advised to set this parameter to 0 or 3 for training. To locate errors, set this parameter to 1 or 2, which might compromise the network performance. If --op_debug_level is set to 2 (that is, CCEC compilation is enabled), the size of the operator kernel file (.o file) increases. In the dynamic shape scenario, all possible shape scenarios are traversed during operator build, which may cause operator build failures due to large operator kernel files. In this case, do not enable the CCE compiler options. If a build failure is caused by the large operator kernel file, the following log is displayed: message:link error ld.lld: error: InputSection too large for range extension thunk* ./kernel_meta_xxxxx.o When the debug function is enabled, if the model contains the following MC2 operators, the .o, .json, and .cce files of the operators are not generated in the kernel_meta* directory. MatMulAllReduce MatMulAllReduceAddRmsNorm AllGatherMatMul MatMulReduceScatter AlltoAllAllGatherBatchMatMul BatchMatMulReduceScatterAlltoAll Configuration example: {ge::ir_option::OP_DEBUG_LEVEL, "1"} Applicability: Atlas 200/300/500 Inference Product : supported Atlas Training Series Product : supported
OP_DEBUG_CONFIG	Enable for global memory check. Arguments: The value is the path of the .cfg configuration file. Multiple options in the configuration file are separated by commas (,). oom: Checks whether memory overwriting occurs in the global memory during operator execution. Configuring this option retains the binary operator file (.o) and operator description file (.json) in the kernel_meta folder under the current execution directory during operator build. If this option is used, the following detection logic is added during operator build. You can use the dump_cce option to view the following code in the generated .cce file: inline __aicore__ void CheckInvalidAccessOfDDR(xxx) { if (access_offset < 0 \|\| access_offset + access_extent > ddr_size) { if (read_or_write == 1) { trap(0X5A5A0001); } else { trap(0X5A5A0002); } } } During actual execution, if memory overwriting occurs, the error code EZ9999 is reported. dump_bin: Retains the binary operator file (.o) and operator description file (.json) in the kernel_meta folder under the current execution directory during operator build. dump_cce: Retains the operator CCE file (.cce), binary operator file (.o), and operator description file (.json) in the kernel_meta folder under the current execution directory during operator build. dump_loc: Retains the Python-CCE mapping file (_loc.json) in the kernel_meta* folder under the current execution directory during operator build. ccec_O0: Enables the CCEC option -O0 during operator build. This option does not optimize the debugging information for later analysis of AI Core errors. ccec_g: Enables the CCEC option -g during operator build. This option optimizes the debugging information for later analysis of AI Core errors. check_flag: Checks whether pipeline synchronization signals in operators match each other during operator execution. Configuring this option retains the binary operator file (.o) and operator description file (.json) in the kernel_meta folder under the current execution directory during operator build. If this option is used, the following detection logic is added during operator build. You can use the dump_cce option to view the following code in the generated .cce file: set_flag(PIPE_MTE3, PIPE_MTE2, EVENT_ID0); set_flag(PIPE_MTE3, PIPE_MTE2, EVENT_ID1); set_flag(PIPE_MTE3, PIPE_MTE2, EVENT_ID2); set_flag(PIPE_MTE3, PIPE_MTE2, EVENT_ID3); .... pipe_barrier(PIPE_MTE3); pipe_barrier(PIPE_MTE2); pipe_barrier(PIPE_M); pipe_barrier(PIPE_V); pipe_barrier(PIPE_MTE1); pipe_barrier(PIPE_ALL); wait_flag(PIPE_MTE3, PIPE_MTE2, EVENT_ID0); wait_flag(PIPE_MTE3, PIPE_MTE2, EVENT_ID1); wait_flag(PIPE_MTE3, PIPE_MTE2, EVENT_ID2); wait_flag(PIPE_MTE3, PIPE_MTE2, EVENT_ID3); ... During actual inference, if the pipeline synchronization signals in operators do not match each other, a timeout error is reported at the faulty operator, and the program is terminated. The following is an example of the error message: Aicore kernel execute failed, ..., fault kernel_name=operator name,... rtStreamSynchronizeWithTimeout execute failed.... Configuration example: /root/test0.cfg. The information about the test0.cfg file is as follows: {ge::ir_option::OP_DEBUG_CONFIG, "ccec_g,oom"} Restrictions: During operator compilation, if you want to compile only some instead of all AI Core operators, you need to add the OP_DEBUG_LIST field to the test0.cfg configuration file. By doing so, only the operators specified in the list are compiled, based on the options configured in OP_DEBUG_CONFIG. The OP_DEBUG_LIST field has the following requirements: The operator name or operator type can be specified. Operators are separated by commas (,). The operator type is configured in OpType::typeName format. The operator type and operator name can be configured in a mixed manner. The operator to be compiled must be stored in the configuration file specified by OP_DEBUG_CONFIG. Configuration example: Add the following information to the configuration file (for example, test0.cfg) specified by OP_DEBUG_CONFIG: {ge::ir_option::OP_DEBUG_CONFIG, "ccec_g,oom"} {ge::ir_option::OP_DEBUG_LIST, "GatherV2,opType::ReduceSum"} During model compilation, the GatherV2,ReduceSum operator is compiled based on the ccec_g and oom options. NOTE: When ccec_O0 and ccec_g are enabled, the size of the operator kernel file (.o file) increases. In dynamic shape scenarios, all possible scenarios are traversed during operator build, which may cause operator build failures due to large operator kernel files. In this case, do not enable the options of the CCE compiler. If the build failure is caused by the large operator kernel file, the following log is displayed: message:link error ld.lld: error: InputSection too large for range extension thunk* ./kernel_meta_xxxxx.o:(xxxx) The ccec_O0 and oom options of the CCEC cannot be both enabled. Otherwise, an AI Core error may be reported. The following is an example of the error message: ...there is an aivec error exception, core id is 49, error code = 0x4 ... If the NPU_COLLECT_PATH environment variable is configured, the function of checking whether global memory overwriting occurs cannot be enabled (OP_DEBUG_CONFIG is set to oom). Otherwise, an error is reported when the compiled model file or operator kernel package is used. When the build options oom, dump_bin, dump_cce, and dump_loc are configured, if the model contains the following MC2 operators, the .o, .json, and .cce files of the operators are not generated in the kernel_meta* directory. MatMulAllReduce MatMulAllReduceAddRmsNorm AllGatherMatMul MatMulReduceScatter AlltoAllAllGatherBatchMatMul BatchMatMulReduceScatterAlltoAll Applicability: Atlas 200/300/500 Inference Product : supported Atlas Training Series Product : supported
MODIFY_MIXLIST	When mixed precision is enabled, you can use this parameter to specify the path and file name of the blocklist, trustlist, and graylist, and specify the operators that allow precision reduction and those that do not allow precision reduction. Set this parameter to the path and file name. The file is in JSON format. For the blocklist, trustlist, and graylist, you can view the value of flag in the precision_reduce option in the built-in tuning policy file ${INSTALL_DIR}/opp/built-in/op_impl/ai_core/tbe/config/<soc_version>/aic-<soc_version>-ops-info.json. true (trustlist): Precision reduction is allowed in mixed precision mode. false (blocklist): Precision reduction is not allowed in mixed precision mode. Not specified (graylist): Operators on the graylist follow the same precision processing as its upstream operator. Method for enabling mixed precision: Set PRECISION_MODE to allow_mix_precision and allow_mix_precision_fp16. Set PRECISION_MODE_V2 to mixed_float16. This parameter and PRECISION_MODE cannot be configured at the same time. You are advised to use PRECISION_MODE_V2. Configuration example: {ge::ir_option::MODIFY_MIXLIST, "/home/test/ops_info.json"} You can specify the operator type (or types separated by commas) in ops_info.json as follows. { "black-list": { // Blocklist "to-remove": [ // Move an operator from the blocklist to the graylist. Ensure that the specified operator is already on the blocklist. "Xlog1py" ], "to-add": [ // Move an operator from the trustlist or graylist to the blocklist. "Matmul", "Cast" ] }, "white-list": { // Trustlist "to-remove": [ // Move an operator from the trustlist to the graylist. Ensure that the specified operator is already on the trustlist. "Conv2D" ], "to-add": [ // Move an operator from the blocklist or graylist to the trustlist. "Bias" ] } } The operators in the preceding example configuration file are for reference only. The configuration should be based on the actual hardware environment and the built-in tuning strategies of the operators. The following is an example of blocklist, trustlist, and graylist query: "Conv2D":{ "precision_reduce":{ "flag":"true" }, true: trustlist; false: blocklist; Not configured: graylist. Applicability: Atlas 200/300/500 Inference Product : supported Atlas Training Series Product : supported
SPARSITY	Global sparsity enable. In the model output by AMCT after 2:4 structured sparsity, there may be the cases that at least two weight elements in the Cin dimension out of four contiguous ones are forced to zero. You can enable global sparsity during model conversion to filter out two elements to reduce computational demand for inference and optimize inference performance. Due to hardware restrictions, this parameter cannot be used together with ENABLE_COMPRESS_WEIGHT or COMPRESS_WEIGHT_CONF. Arguments: 1: Indicates that 2:4 structured sparsity is enabled. 0: Indicates that sparsity is disabled. The default value is 0. Configuration example: {ge::ir_option::SPARSITY, "1"} Restrictions: When using this parameter, ensure that a sparse model is used. You are advised to use the compression combination function of AMCT (TensorFlow) or AMCT (PyTorch). The compression combination requires 2:4 structured sparsity and quantization aware training. Applicability: Atlas 200/300/500 Inference Product : not supported Atlas Training Series Product : not supported
EXTERNAL_WEIGHT	Externalizes the weights of the Const/Constant nodes on the network and converts the weights to FileConstant when the OM model file is generated. In the offline scenario, if the model weight is large and the environment has restrictions on the .om file size, you are advised to enable the external weight to save the weight separately to reduce the .om file size. Arguments: 0: Saves the weights in the .om model file. The default value is 0. 1: externalizes the weights and flushes the weight files of all Const/Constant nodes on the network. The node type is converted to FileConstant. The weight files are named as weight_+hash. Configuration example: {ge::ir_option::EXTERNAL_WEIGHT, "1"} Restrictions: In the external weight scenario, when AscendCL APIs are used to develop inference applications and load models: Use the aclgrphSaveModel API to save the OM model. If the aclmdlLoadFromFile API is used to load a model, the weight file must be stored in the weight directory at the same level as the .om file. If the aclmdlSetConfigOpt and aclmdlLoadWithConfig APIs are used to load a model, there is no requirement on the external weight directory. When the model is loaded later, use the aclmdlLoadWithConfig API to specify the external weight directory. In the weight update scenario, use the aclgrphBundleSaveModel API to save the OM model. Only the aclmdlBundleLoadFromFile API can be used to load a model, and the weight file must be stored in the weight directory at the same level as the .om file. For details about the APIs, see "Model Loading and Unloading". Applicability: Atlas 200/300/500 Inference Product : supported Atlas Training Series Product : supported
DETERMINISTIC	Enables or disables deterministic computing. By default, deterministic computing is disabled. The results of multiple executions of an operator with the same hardware and input may be different. This is generally caused by asynchronous multi-thread executions during operator implementation, which changes the accumulation sequence of floating point numbers. When deterministic computing is enabled, the same output is generated if an operator is executed for multiple times with the same hardware and input. You are advised not to enable deterministic computing because it slows down operator execution and affects performance. If the execution results of a model are different for multiple times or the precision needs to be optimized, you can enable deterministic computing to assist model debugging and optimization. Arguments: 0 (default): disabled. 1: enabled. Configuration example: {ge::ir_option::DETERMINISTIC, "1"} Applicability: Atlas Training Series Product : supported Atlas 200/300/500 Inference Product : not supported
OPTION_HOST_ENV_OS	If the OS and architecture of the model build environment are inconsistent with those of the model operating environment, set this parameter to the OS type of the model operating environment. If this parameter is not set, the OS type of the model build environment is used by default. This parameter is used together with OPTION_HOST_ENV_CPU. You can use OPTION_HOST_ENV_OS to set the OS type and use OPTION_HOST_ENV_CPU to set the OS architecture. Argument: linux Configuration example: {ge::ir_option::OPTION_HOST_ENV_OS, "linux"} {ge::ir_option::OPTION_HOST_ENV_CPU, "x86_64"} Applicability: Atlas 200/300/500 Inference Product : supported Atlas Training Series Product : supported
OPTION_HOST_ENV_CPU	If the OS and its architecture of the model build environment are inconsistent with those of the model operating environment, set this parameter to the OS architecture of the model operating environment. If this parameter is not set, the OS architecture of the model build environment is used by default. It is used together with OPTION_HOST_ENV_OS. Arguments: AArch64 x86_64 Configuration example: {ge::ir_option::OPTION_HOST_ENV_OS, "linux"} {ge::ir_option::OPTION_HOST_ENV_CPU, "x86_64"} If the generated offline model contains the OS type and architecture, for example, xxx_linux_x86_64.om, the model can run only on the Linux x86_64 OS. If the generated offline model does not contain the OS type and architecture, for example, xxx.om, all OSs supported by the CANN package support the model. Applicability: Atlas 200/300/500 Inference Product : supported Atlas Training Series Product : supported
VIRTUAL_TYPE	Specifies whether an offline model can run on a virtual device generated by the Ascend virtual instance feature. If the computing power of a chip is too much for cloud users or small enterprises, the Ascend virtual instance feature can be applied to allocate a proper amount of computing power as needed by the users or small enterprises to suit their services. A virtual device is a virtual acceleration resource allocated by a chip based on specified computing power. Arguments: 0 (default): The offline model does not run on the virtual device generated by the Ascend virtual instance feature. 1: The offline model runs on virtual devices with different computing power. Configuration example: {ge::ir_option::VIRTUAL_TYPE, "1"} Restrictions: During model conversion, if the value is set to 1, the number of NPU blocks (blockdim) of the generated offline model may be greater than the actual number of aicore_num cores. The value is the least common multiple supported by aicore_num. For example, if the value range of aicore_num is {1,2,4,8}, the number of NPU blocks may be 8. If the value is set to 1 and the generated model contains the following operators, a single core is used by default. In this case, the inference performance of the generated model deteriorates. DynamicRNN PadV2D SquareSumV2 DynamicRNNV2 DynamicRNNV3 DynamicGRUV Applicability: Atlas Training Series Product : supported Atlas 200/300/500 Inference Product : not supported
COMPRESSION_OPTIMIZE_CONF	Configures the path including the name of the compression optimization configuration file. This parameter is used to enable the compression optimization function specified in the configuration file to improve network performance. For example, /home/test/compression_optimize.cfg. An example of the file contents is as follows. enable_first_layer_quantization:true This file supports configuration of only the enable_first_layer_quantization feature, which specifies whether to optimize the convolution at the first layer of the AIPP. (The AIPP is merged with the Quant operator before the CONV2D convolution at the first layer of the model after quantization.) When the enable_first_layer_quantization feature is enabled, performance is improved only when the AIPP+CONV2D structure exists in the network structure and ENABLE_SMALL_CHANNEL is set to 1 during model compilation. The accuracy of the quantized model is compromised to some extent. Therefore, you can determine whether to enable this feature as required. In the configuration file, the name of a compression feature is followed by a value, either true (feature enabled) or false (feature disabled; default). The feature name and the value are separated with a colon (:). Applicability: Atlas 200/300/500 Inference Product : not supported Atlas Training Series Product : not supported
CLUSTER_CONFIG	Applicable to the distributed compilation and partition of foundation models. Specifies the configuration file and path of the logical topology in the target deployment environment. After being parsed, the file is used for offline build of the HCCL operator in a graph. If the graph contains communication operators or algorithm-based sharding is enabled, you need to configure this parameter. Configuration example: {ge::ir_option::CLUSTER_CONFIG, "/home/test/cluster_config.json"} The configuration file must be in JSON format. The following is an example. For details about the parameters, see Parameters in the CLUSTER_CONFIG File. Atlas Training Series Product : The number of device processors in use is cluster_nodesitem_lists. The number of item_ids in each cluster_nodes* must be the same. 4p logical network config: { "cluster": [{ "cluster_nodes": [{ "node_id": 0, "node_type": "ATLAS800", "ipaddr": "127.0.0.1", // (Required) IP address for communication on the control plane of a node, string type. For example, the IP address of a training server is the host IP address, and that of a SoC server is the head node IP address. "port": 2509, // (Required) port for communication on the control plane of a node, integer type. "is_local": true, "item_list": [{ "item_id": 0 }, { "item_id": 1 }, { "item_id": 2 }, { "item_id": 3 }] }] }], "item_def": [{ "item_type": "<soc_version>" }], "node_def": [{ // Public attributes of nodes of the same type in a cluster. "item": [{ "item_type": "<soc_version>" // (Required) accelerator card type on a node, string type. }] }] } Applicability: Atlas Training Series Product : supported Atlas 200/300/500 Inference Product : not supported
OPTION_SCREEN_PRINT_MODE	Determines whether to display the graph build process. Arguments: enable (default): displays the graph build process. disable: not display the graph build process. Configuration example: {ge::ir_option::OPTION_SCREEN_PRINT_MODE, "disable"} Applicability: Atlas 200/300/500 Inference Product : supported Atlas Training Series Product : supported
AC_PARALLEL_ENABLE	Whether to allow AI CPU operators and AI Core operators to run in parallel in a dynamic-shape graph. In a dynamic-shape graph, when this function is enabled, the system automatically identifies AI CPU operators that can be run in parallel with the AI Core operators in the graph. Operators of different engines are distributed to different streams to run in parallel, improving resource utilization and dynamic shape execution performance. Arguments: 1: AI CPU operators and AI Core operators are allowed to run in parallel. 0 (default): AI CPU operators are not separately distributed. Configuration example: {ge::ir_option::AC_PARALLEL_ENABLE, "1"} Applicability: Atlas Training Series Product : supported Atlas 200/300/500 Inference Product : not supported
TILING_SCHEDULE_OPTIMIZE	Tiling offload scheduling optimization. As internal storage of the AI Core in the NPU cannot store all the input and output data of operators, the input data is tiled into different parts. The first part is transferred in, computed, and then transferred out, so does the next part. This process is called tiling. Then, a computation program, called tiling implementation, determines tiling parameters (such as the block size transferred each time and the total number of cycles) based on operator information such as shape. The AI Core is not good at scalar computation in the tiling implementation. Therefore, tiling implementation is generally executed on the CPU on the host. However, tiling implementation is executed on the device when the following conditions are met: The model is static-shape. Operators in the model, such as the FusedInferAttentionScore and IncreFlashAttention fused operators, support tiling offload. The value of the operator that supports tiling offloading depends on the execution result of the previous operator. Arguments: 0 (default): Tiling offload is disabled. 1: Tiling offload is enabled. Configuration example: {ge::ir_option::TILING_SCHEDULE_OPTIMIZE, "1"} Applicability: Atlas Training Series Product : not supported Atlas 200/300/500 Inference Product : not supported
OPTION_EXPORT_COMPILE_STAT	Whether to generate the result file fusion_result.json of operator fusion information (including graph fusion and UB fusion) during graph build. This file is used to record the fusion patterns used during graph build. In the file: session_and_graph_id_ xx_xx: Thread and graph ID to which the fusion result belongs. graph_fusion: Graph fusion. ub_fusion: UB fusion. match_times: Number of times that a fusion pattern is hit during graph build. effect_times: Number of times that a fusion pattern takes effect. repository_hit_times: Number of times that the repository is hit during UB fusion. Arguments: 0: The result file of operator fusion information is not generated. 1 (default): The result file of operator fusion information is generated when the program exits normally. 2: The result file of operator fusion information is generated after graph build. That is, if the graph build is complete and the subsequent program is interrupted in advance, the result file of operator fusion information is also generated. NOTE: If the ASCEND_WORK_PATH environment variable is not set, the result file is generated in the current path where the atc command is executed by default. If the ASCEND_WORK_PATH environment variable is set, the result file fusion_result.json is saved in *$ASCEND_WORK_PATH/FE/${Process ID}. The fusion patterns disabled using FUSION_SWITCH_FILE* are not displayed in the fusion_result.json file. Configuration example: {ge::ir_option::OPTION_EXPORT_COMPILE_STAT, "1"} Applicability: Atlas Training Series Product : supported Atlas 200/300/500 Inference Product : supported

Parent topic: Data Types