aclgrphBuildInitialize Configuration Parameters

Basic Functions

Parameter

Description

SOC_VERSION

Ascend AI Processor used during graph build.

  • This parameter is optional if the current environment has the Ascend AI Processor.
  • This parameter is required if the current environment does not have the Ascend AI Processor, that is, the development environment.

To query <soc_version>:

  • For the following products: Run the npu-smi info command on the server where Ascend AI Processor is installed to obtain the Name information. The actual value is AscendName. For example, if Name is xxxyy, the actual value is Ascendxxxyy.

    Atlas A2 training products / Atlas A2 inference products

    Atlas 200I/500 A2 inference products

    Atlas inference products

    Atlas training products

  • For the following products: Run the npu-smi info -t board -i id -c chip_id command on the server where Ascend AI Processor is installed to obtain the Chip Name and NPU Name information. The actual value is Chip Name_NPU Name. For example, if the value of Chip Name is Ascendxxx and the value of NPU Name is 1234, the actual value is Ascendxxx_1234. Note that:
    • id: device ID, which is the NPU ID obtained by running the npu-smi info -l command.
    • chip_id: chip ID, which is obtained by running the npu-smi info -m command.

    Atlas A3 training products / Atlas A3 inference products

Configuration example:

{ge::ir_option::SOC_VERSION, "<soc_version>"}

Applicability:

Atlas inference products : supported

Atlas training products : supported

Atlas 200I/500 A2 inference products : supported

Atlas A2 training products / Atlas A2 inference products : supported

Atlas A3 training products / Atlas A3 inference products : supported

ENABLE_SINGLE_STREAM

Whether to enable single-stream serial execution of model inference in the static shape scenario.

Streams preserve the order of a stack of asynchronous operations being executed on the device.

Arguments:

  • true: Enables single-stream serial execution of model inference.
  • false (default): Disables single-stream serial execution of model inference and enables multi-stream parallel execution.

Restrictions:

If the model contains the Cmo operator and the following control operators, the single-stream feature cannot be used. In this case, use the default value false.

  • Merge
  • Switch
  • Enter
  • RefEnter

Configuration example:

{ge::ir_option::ENABLE_SINGLE_STREAM, "true"}

Applicability:

Atlas training products : supported

Atlas A2 training products / Atlas A2 inference products : supported

Atlas A3 training products / Atlas A3 inference products : supported

Atlas inference products : not supported

Atlas 200I/500 A2 inference products : not supported

DETERMINISTIC

Whether to enable deterministic computing.

By default, deterministic computing is disabled. Multiple execution results of an operator with the same hardware and input may be different. This is generally caused by asynchronous multi-thread executions during operator implementation, which changes the accumulation sequence of floating-point numbers. When deterministic computing is enabled, the same output is generated if an operator is executed for multiple times with the same hardware and input.

You are advised not to enable deterministic computing because it slows down operator execution and affects performance. If multiple execution results of a model are different or the precision needs to be optimized, you can enable deterministic computing to assist model debugging and optimization.

Arguments:

  • 0 (default): Disables deterministic computing.
  • 1: Enables deterministic computing.

Configuration example:

{ge::ir_option::DETERMINISTIC, "1"}

Applicability:

Atlas inference products : supported

Atlas training products : supported

Atlas A2 training products / Atlas A2 inference products : supported

Atlas A3 training products / Atlas A3 inference products : supported

Atlas 200I/500 A2 inference products : not supported

OPTION_HOST_ENV_OS

If the OS and its architecture of the model build environment are inconsistent with those of the model operating environment, set this parameter to the OS type of the model operating environment. If this parameter is not set, the OS type of the model build environment is used by default.

If the OS and its architecture of the model build environment are inconsistent with those of the model operating environment, use this option together with OPTION_HOST_ENV_CPU. OPTION_HOST_ENV_OS is used to set the OS type, and OPTION_HOST_ENV_CPU is used to set the OS architecture.

Argument: OS type of the operator .so file packaged in the ${INSTALL_DIR}/opp/built-in/op_graph/lib/ directory.

Default value: value in the ${INSTALL_DIR}/opp/scene.info file.

Replace ${INSTALL_DIR} with the CANN component directory. For example, if the installation is performed by the root user, the default file storage path is /usr/local/Ascend/cann.

Configuration example:

{ge::OPTION_HOST_ENV_OS, "linux"}
{ge::OPTION_HOST_ENV_CPU, "x86_64"}

Applicability:

Atlas inference products : supported

Atlas training products : supported

Atlas 200I/500 A2 inference products : supported

Atlas A2 training products / Atlas A2 inference products : supported

Atlas A3 training products / Atlas A3 inference products : supported

OPTION_HOST_ENV_CPU

If the OS and its architecture of the model build environment are inconsistent with those of the model operating environment, set this parameter to the OS architecture of the model operating environment. If this parameter is not set, the OS architecture of the model build environment is used by default.

If the OS and its architecture of the model build environment are inconsistent with those of the model operating environment, use this option together with OPTION_HOST_ENV_OS. OPTION_HOST_ENV_OS is used to set the OS type, and OPTION_HOST_ENV_CPU is used to set the OS architecture.

Argument: OS or CPU type of the operator .so file packaged in the ${INSTALL_DIR}/opp/built-in/op_graph/lib/ directory.

Default value: value in the ${INSTALL_DIR}/opp/scene.info file.

Replace ${INSTALL_DIR} with the CANN component directory. For example, if the installation is performed by the root user, the default file storage path is /usr/local/Ascend/cann.

Configuration example:

{ge::OPTION_HOST_ENV_OS, "linux"}
{ge::OPTION_HOST_ENV_CPU, "x86_64"}
  • If the generated offline model contains the OS type and architecture, for example, xxx_linux_x86_64.om, the model can run only on the Linux x86_64 OS.
  • If the generated offline model does not contain the OS type and architecture, for example, xxx.om, all OSs supported by the CANN package support the model.

Applicability:

Atlas inference products : supported

Atlas training products : supported

Atlas 200I/500 A2 inference products : supported

Atlas A2 training products / Atlas A2 inference products : supported

Atlas A3 training products / Atlas A3 inference products : supported

VIRTUAL_TYPE

Whether an offline model can run on a virtual device generated by the Ascend virtual instance feature.

If the computing power of a chip is too much for cloud users or small enterprises, the Ascend virtual instance feature can be applied to allocate a proper amount of computing power as needed by the users or small enterprises to suit their services.

A virtual device is a virtual acceleration resource allocated by a chip based on specified computing power.

Arguments:

  • 0 (default): The offline model does not run on the virtual device generated by the Ascend virtual instance feature.
  • 1: The offline model runs on virtual devices with different computing power.

Configuration example:

{ge::ir_option::VIRTUAL_TYPE, "1"}

Restrictions:

  1. If model conversion is performed with this parameter set to 1, the number of computing logical AI Cores of the generated offline model may be greater than the actual number of cores specified by aicore_num. The value is the least common multiple supported by aicore_num.

    For example, if the value range of aicore_num is {1,2,4,8}, the number of NPU blocks may be 8.

  2. If this parameter is set to 1 and the generated model contains the following operators, a single core is used by default. In this case, the inference performance of the generated model deteriorates.
    • DynamicRNN
    • PadV2D
    • SquareSumV2
    • DynamicRNNV2
    • DynamicRNNV3
    • DynamicGRUV

Applicability:

Atlas inference products : supported

Atlas training products : supported

Atlas 200I/500 A2 inference products : supported

Atlas A2 training products / Atlas A2 inference products : supported

Atlas A3 training products / Atlas A3 inference products : supported

Memory Management

Parameter

Description

EXEC_DISABLE_REUSED_MEMORY

Memory reuse switch.

Memory reuse refers to the practice of repeatedly utilizing non-conflicting memory based on its lifecycle and size, thereby reducing network memory consumption.

Arguments:

  • 0 (default): Enables memory reuse.
  • 1: disabled If the network model is large, disabling memory reuse will cause the device memory not to be reused during subsequent inference, resulting in insufficient memory.

Configuration example:

{ge::ir_option::EXEC_DISABLE_REUSED_MEMORY, "0"}

Applicability:

Atlas inference products : supported

Atlas training products : supported

Atlas 200I/500 A2 inference products : supported

Atlas A2 training products / Atlas A2 inference products : supported

Atlas A3 training products / Atlas A3 inference products : supported

EXTERNAL_WEIGHT

Whether to externalize the weights of the Const/Constant nodes on the original network and convert the node type to FileConstant when the OM model file is generated.

In the offline scenario, if the model weight is large and the environment has restrictions on the OM offline model file size, you are advised to enable the external weight and save the weight separately to reduce the OM file size.

Arguments:

  • 0 (default): The weights are not externalized and are directly saved in the OM offline model file.
  • 1: The weights are externalized. The weight files of all Const/Constant nodes on the network are flushed to the disk, and the node type is converted to FileConstant. The weight files are saved in the weight directory at the same level as the OM file. Weights of different nodes are stored in different files, which are named in the format of weight_<hash value>.

Configuration example:

{ge::ir_option::EXTERNAL_WEIGHT, "1"}

Restrictions:

  • In the external weight scenario, when acl APIs are used to develop inference applications and load models:
    • Use the aclgrphSaveModel API to save the OM model.
      • If aclmdlLoadFromFile is used to load a model, the weight file must be stored in the weight directory at the same level as the OM file.
      • If aclmdlSetConfigOpt and aclmdlLoadWithConfig are used to load a model, there is no requirement on the external weight directory. When the model is loaded later, use aclmdlLoadWithConfig to specify the external weight directory.
    • In the weight update scenario, use aclgrphBundleSaveModel to save the OM model.

      Only aclmdlBundleLoadFromFile can be used to load a model, and the weight file must be stored in the weight directory at the same level as the OM file.

    For details about the APIs, see ""Model Loading and Unloading"".

Applicability:

Atlas inference products : supported

Atlas training products : supported

Atlas 200I/500 A2 inference products : supported

Atlas A2 training products / Atlas A2 inference products : supported

Atlas A3 training products / Atlas A3 inference products : supported

Dynamic Shape

Parameter

Description

AC_PARALLEL_ENABLE

Whether to allow AI CPU operators and AI Core operators to run in parallel in a dynamic shape graph.

In a dynamic shape graph, when this function is enabled, the system automatically identifies AI CPU operators that can be run in parallel with the AI Core operators in the graph. Operators of different engines are distributed to different streams to run in parallel, improving resource utilization and dynamic shape execution performance.

Arguments:

  • 1: AI CPU operators and AI Core operators are allowed to run in parallel.
  • 0 (default): AI CPU operators are not separately distributed.

Configuration example:

{ge::ir_option::AC_PARALLEL_ENABLE, "1"}

Applicability:

Atlas inference products : supported

Atlas training products : supported

Atlas A2 training products / Atlas A2 inference products : supported

Atlas A3 training products / Atlas A3 inference products : supported

Atlas 200I/500 A2 inference products : not supported

Operator and Graph Build

Parameter

Description

CORE_TYPE

Core type used during graph build. If the graph contains Cube operators, the value can only be AiCore.

Arguments:

  • VectorCore
  • AiCore (default)

Configuration example:

{ge::ir_option::CORE_TYPE, "AiCore"}

Applicability:

Atlas inference products : supported

Atlas training products : not supported

Atlas 200I/500 A2 inference products : not supported

Atlas A2 training products / Atlas A2 inference products : not supported

Atlas A3 training products / Atlas A3 inference products : not supported

AICORE_NUM

Number of AI Cores used for operator build.

Argument: "integer 1|integer 2", separated by a vertical bar (|).

  • Scenario 1: For the following products, integer 1 indicates the number of Cube Cores in the AI Core used for operator build, and integer 2 indicates the number of Vector Cores in the AI Core used for operator build. Both integer 1 and integer 2 must be greater than 0 and less than or equal to the maximum numbers of Cube Cores and Vector Cores included in the Ascend AI Processor.

    Atlas A3 training products / Atlas A3 inference products

    Atlas A2 training products / Atlas A2 inference products

  • Scenario 2: For the following products, only integer 1 needs to be configured in the format of "integer 1|", indicating the number of AI Cores used for operator build. If integer 2 is configured, it does not take effect.

    Atlas 200I/500 A2 inference products

    Atlas inference products

    Atlas training products

Restrictions:

  • For scenario 1 of the argument:
    You can view the maximum numbers of Cube Cores and Vector Cores contained in different Ascend AI Processors in the ${INSTALL_DIR}/<arch>-linux/data/platform_config/xxx.ini file. The following information indicates that there are 24 Cube Cores and 48 Vector Cores on the Ascend AI Processor:
    [SoCInfo]
    # Use the default parameter values, which are the maximum values.
    ai_core_cnt=24
    cube_core_cnt=24
    vector_core_cnt=48
  • For scenario 2 of the argument:
    You can view the maximum number of AI Cores contained in different Ascend AI Processors in the ${INSTALL_DIR}/<arch>-linux/data/platform_config/xxx.ini file. The following information indicates that there are 10 AI Cores on the Ascend AI Processor:
    [SoCInfo]
    # Use the default parameter value, which indicates the maximum number of AI Cores.
    ai_core_cnt=10
    vector_core_cnt=8
  • If the operator build cache function is enabled (OP_COMPILER_CACHE_MODE set to enable or force; default value: enable) and this parameter is configured, it takes effect only during the first build. To make this option take effect during non-initial build, you need to clear the cache of the build disk.

Replace ${INSTALL_DIR} with the CANN component directory. For example, if the installation is performed by the root user, the default file storage path is /usr/local/Ascend/cann. <arch> indicates the OS architecture and xxx varies depending on the product.

Configuration example:

  • Configuration example for scenario 1:
    {ge::ir_option::AICORE_NUM, "24|48"}
  • Configuration example for scenario 2:
    {ge::ir_option::AICORE_NUM, "10|"}
    Or
    {ge::ir_option::AICORE_NUM, "10"}

Relationships between AI Cores, Cube Cores, and Vector Cores:

The definition of a Core helps you better understand the relationships between AI Cores, Cube Cores, and Vector Cores. A Core is a compute core with an independent scalar compute unit. Generally, the scalar compute unit provides multiple functions for the compute core, such as the single instruction multiple data (SIMD) instruction dispatch. Therefore, the scalar compute unit is also called the intra-core scheduling unit. The AI data processing core unit varies with products. Currently, there are the following types:

  • The AI data processing core unit is an AI Core:
    • In an AI Core, a Cube and a Vector share a Scalar scheduling unit, for example, Atlas training products .

    • In an AI Core, a Cube and a Vector have their own Scalar scheduling units, which are also called a Cube Core and a Vector Core. In this case, a Cube Core and a group of Vector Cores are defined as an AI Core. The number of AI Cores is usually calculated based on the number of Cube Cores, for example, Atlas A2 training products / Atlas A2 inference products .

  • The AI data processing core units are AI Cores and independent Vector Cores. The AI Cores and Vector Cores have independent Scalar scheduling units, for example, Atlas inference products .

OP_COMPILER_CACHE_MODE

Disk cache mode for operator compilation.

Arguments:

  • enable (default): enabled. If it is enabled, operators with the same compilation configurations and operator configurations will not be built repeatedly, thus accelerating the compilation speed.
  • force: enabled with cache forcibly refreshed. That is, the existing cache is cleared up before the operator is recompiled and added to the cache. For example, for Python changes, dependency library changes, or repository changes after operator optimization, you need to set this option to force to clear up the existing cache and then change it to enable to prevent the cache from being forcibly refreshed during each build.
  • disable: disabled.

Configuration example:

{ge::ir_option::OP_COMPILER_CACHE_MODE, "enable"}

Instructions:

  • To specify the disk cache path for operator compilation, use this parameter together with OP_COMPILER_CACHE_DIR.
  • When you enable the operator compilation cache function, set the disk space of the cache folder with the configuration file (the op_cache.ini file automatically generated in the path specified by OP_COMPILER_CACHE_DIR after operator build) or environment variables.
    1. Using the op_cache.ini configuration file:

      If the op_cache.ini file does not exist, manually create it. Open the file and add the following information:

      # Configure the file format (required). The automatically generated file contains the following information by default. When manually creating a file, enter the following information:
      [op_compiler_cache]
      # Limit the disk space of the cache folder on a chip, in MB. The default value is 500. The value must be an integer.
      max_op_cache_size=500
      # Set the ratio of the cache size to be reserved, in percentage. The value range is [1, 100]. The default value is 50. For example, 80 indicates that when the cache space is insufficient, 80% of the cache space is reserved and the rest is cleared up.
      remain_cache_size_ratio=50    
      • The op_cache.ini file takes effect only when the values of max_op_cache_size and remain_cache_size_ratio in the preceding file are valid.
      • If the size of the build cache file exceeds the value of max_op_cache_size and the cache file is not accessed for more than half an hour, the cache file will be aged. (Operator build will not be interrupted due to the size of the build cache file exceeding the set limit. Therefore, if max_op_cache_size is set to a small value, the size of the actual build cache file may exceed the configured value.)
      • To disable the build cache aging function, set max_op_cache_size to -1. In this case, the access time is not updated when the operator cache is accessed, the operator build cache is not aged, and the default disk space of 500 MB is used.
      • If multiple users use the same cache path, you are advised to use the configuration file to set the cache path. In this scenario, the op_cache.ini file affects all users.
    2. Using environment variables

      In this scenario, the environment variable ASCEND_MAX_OP_CACHE_SIZE is used to limit the storage space of the cache folder of a chip. When the build cache space reaches the specified value and the cache file is not accessed for more than half an hour, the cache file is aged. The environment variable ASCEND_REMAIN_CACHE_SIZE_RATIO is used to set the ratio of the cache space to be reserved.

      A configuration example is as follows:

      # The ASCEND_MAX_OP_CACHE_SIZE environment variable defaults to 500, in MB. The value must be an integer.
      export ASCEND_MAX_OP_CACHE_SIZE=500
      # The value range of the ASCEND_REMAIN_CACHE_SIZE_RATIO environment variable is [1, 100]. The default value is 50, in percentage. For example, 80 indicates that when the cache space is insufficient, 80% of the cache space is reserved and the rest is cleared up.
      export ASCEND_REMAIN_CACHE_SIZE_RATIO=50
      • The argument configured through environment variables takes effect only for the current user.
      • To disable the build cache aging function, set the environment variable ASCEND_MAX_OP_CACHE_SIZE to -1. In this case, the access time is not updated when the operator cache is accessed, the operator build cache is not aged, and the default disk space of 500 MB is used.

    If both the op_cache.ini file and the environment variable are configured, the configuration items in the op_cache.ini file are read first. If neither the op_cache.ini file nor the environment variables are configured, the system's default values (500 MB disk space and 50% of reserved cache space) are read.

  • If this parameter is set to force, the existing cache will be cleared. Therefore, it is not recommended for parallel program compilation. Otherwise, the cache used by other models may be cleared, causing compilation failures.
  • disable and force are recommended for publishing the final model.
  • If the repository changes after operator tuning, set this parameter to force to refresh the cache. Otherwise, the new tuning repository cannot be applied, and the tuning application fails to be executed.
  • When the debugging function is enabled:
    • If OP_DEBUG_LEVEL is set to a non-zero value, the OP_COMPILER_CACHE_MODE parameter configuration does not take effect, the operator compilation cache function is disabled, and all operators are recompiled.
    • If OP_DEBUG_CONFIG is not empty and OP_DEBUG_LIST is not configured, the OP_COMPILER_CACHE_MODE parameter configuration does not take effect, the operator compilation cache function is disabled, and all operators are recompiled.
    • If OP_DEBUG_CONFIG is not empty and OP_DEBUG_LIST is configured in the configuration file:
      • For operators in the list, ignore the configuration of OP_COMPILER_CACHE_MODE and continue to recompile them.
      • For operators out of the list, if OP_COMPILER_CACHE_MODE is set to enable or force, the cache function is enabled. If OP_COMPILER_CACHE_MODE is set to disable, the cache function is disabled and the operators are recompiled.

Applicability:

Atlas inference products : supported

Atlas training products : supported

Atlas 200I/500 A2 inference products : supported

Atlas A2 training products / Atlas A2 inference products : supported

Atlas A3 training products / Atlas A3 inference products : supported

OP_COMPILER_CACHE_DIR

Disk cache directory for operator compilation.

Format: The directory can contain letters, digits, underscores (_), hyphens (-), and periods (.).

Default value: $HOME/atc_data

Configuration example:

{ge::ir_option::OP_COMPILER_CACHE_MODE, "enable"}
{ge::ir_option::OP_COMPILER_CACHE_DIR, "/home/test/data/atc_data"}

Restrictions:

  • To specify the disk cache path for operator compilation, use this option together with OP_COMPILER_CACHE_MODE.
  • If the specified directory exists and is valid, a kernel_cache subdirectory is automatically created. If the specified directory does not exist but is valid, the system automatically creates this directory and the kernel_cache subdirectory.
  • Do not store other self-owned content in the default cache directory. The self-owned content will be deleted together with the default cache directory during software package installation or upgrade.
  • The non-default cache directory specified by this option cannot be deleted. The directory will not be deleted during software package installation or upgrade.
  • In addition to OP_COMPILER_CACHE_DIR, the environment variable ASCEND_CACHE_PATH can be used to set the disk cache directory for operator build. The priorities of the configuration methods are as follows: OP_COMPILER_CACHE_DIR > ASCEND_CACHE_PATH > default cache directory.

Applicability:

Atlas inference products : supported

Atlas training products : supported

Atlas 200I/500 A2 inference products : supported

Atlas A2 training products / Atlas A2 inference products : supported

Atlas A3 training products / Atlas A3 inference products : supported

OPTIMIZATION_SWITCH

Fusion pattern (pass) control switch used during operator build.

The difference between this parameter and FUSION_SWITCH_FILE is as follows: This parameter applies to all patterns. It can be used to specify a fusion pattern without a JSON file. FUSION_SWITCH_FILE can only be used to disable the graph fusion and UB fusion patterns, and a JSON file needs to be configured separately. If both parameters are set and the same fusion pattern is configured, the setting of OPTIMIZATION_SWITCH takes precedence.

Argument: Passname1:on;Passname2:off. Multiple key-value pairs can be concatenated. key is the pass name, and value can be set to on (enabled) or off (disabled). Case-sensitive matching is not supported. Multiple groups of configurations are separated by semicolons (;). For details about the fusion patterns that can be configured, see Fusion Pattern List.

Configuration example:

{ge::ir_option::OPTIMIZATION_SWITCH, "Passname1:on;Passname2:off"}

Applicability:

Atlas inference products : supported

Atlas training products : supported

Atlas 200I/500 A2 inference products : supported

Atlas A2 training products / Atlas A2 inference products : supported

Atlas A3 training products / Atlas A3 inference products : supported

Debugging

Parameter

Description

DEBUG_DIR

Directory of the debug-related process files generated during operator build, including the .o (operator binary file), .json (operator description file), and .cce files.

By default, the files are generated in the current directory.

Restrictions:

  • If you want to specify the path for storing the process file of operator compilation, use DEBUG_DIR and OP_DEBUG_LEVEL together. If OP_DEBUG_LEVEL is set to 0, DEBUG_DIR cannot be used.
  • In addition to DEBUG_DIR, the ASCEND_WORK_PATH environment variable can be used to set the path for storing the debugging file generated during operator compilation. The configuration priorities are as follows: DEBUG_DIR > ASCEND_WORK_PATH > default storage path.

Configuration example:

{ge::ir_option::OP_DEBUG_LEVEL, "1"}
{ge::ir_option::DEBUG_DIR, "/home/test/module/out_debug_info"}

Applicability:

Atlas inference products : supported

Atlas training products : supported

Atlas 200I/500 A2 inference products : supported

Atlas A2 training products / Atlas A2 inference products : supported

Atlas A3 training products / Atlas A3 inference products : supported

OP_DEBUG_LEVEL

Debugging switch for operator compilation.

If you want to specify the path for storing the process file of operator compilation, use DEBUG_DIR. If OP_DEBUG_LEVEL is set to 0, DEBUG_DIR does not take effect.

Arguments:

  • 0 (default): Disables operator debug. The operator build folder kernel_meta is not generated in the current execution path.
  • 1: Enables operator debug. The kernel_meta folder is generated in the current execution path, and the .o file (operator binary file), .json file (operator description file), and TBE instruction mapping files (operator file *.cce and python-CCE mapping file *_loc.json) are generated in the folder for later analysis of AI Core errors.
  • 2: Enables operator debug. The kernel_meta folder is generated in the current execution path, and the .o file (operator binary file), .json file (operator description file), and TBE instruction mapping files (operator file *.cce and python-CCE mapping file *_loc.json) are generated in the folder for later analysis of AI Core errors. Setting this option to 2 also disables build optimization and enables the CCE compiler debug function (the CCE compiler option is set to -O0-g).
  • 3: Disables operator debug. The kernel_meta folder is generated in the current execution path, and the .o file (operator binary file) and .json file (operator description file) are generated in the folder. You can refer to these files when analyzing operator errors.
  • 4: Disables operator debug. The kernel_meta folder is generated in the current execution path, and the .o file (operator binary file), .json file (operator description file), TBE instruction mapping file (operator file *.cce), and UB fusion description file ({$kernel_name}_compute.json) are generated in the folder. These files can be used for problem reproduction and accuracy comparison during operator error analysis.
NOTICE:
  • If OP_DEBUG_LEVEL is set to 0 and OP_DEBUG_CONFIG is also set, the operator compilation directory kernel_meta is retained in the current execution path.
  • If OP_DEBUG_LEVEL is set to 0 and the NPU_COLLECT_PATH environment variable is set, the compilation directory kernel_meta is always retained. If the ASCEND_WORK_PATH environment variable is set, the compilation directory is retained in the path specified by the environment variable. If the ASCEND_WORK_PATH environment variable does not exist, the compilation directory is retained in the current execution path.
  • You are advised to set this parameter to 0 or 3 for training. To locate errors, set this parameter to 1 or 2, which might compromise the network performance.
  • If this option is set to 2, the CCE compiler is enabled, and the size of the operator kernel file (*.o file) increases. In the dynamic shape scenario, all possible shape scenarios are traversed during operator build, which may cause operator build failures due to large operator kernel files. In this case, you are advised not to enable the CCE compiler options.

    If a build failure is caused by the large operator kernel file, the following log is displayed:

    message:link error ld.lld: error: InputSection too large for range extension thunk ./kernel_meta_xxxxx.o
  • When the debug function is enabled, if the model contains the following merged compute and communication (MC2) operators, the *.o, *.json, and *.cce files of the operators are not generated in the operator build folder kernel_meta.

    MatMulAllReduce

    MatMulAllReduceAddRmsNorm

    AllGatherMatMul

    MatMulReduceScatter

    AlltoAllAllGatherBatchMatMul

    BatchMatMulReduceScatterAlltoAll

Configuration example:

{ge::ir_option::OP_DEBUG_LEVEL, "1"}

Applicability:

Atlas inference products : supported

Atlas training products : supported

Atlas 200I/500 A2 inference products : supported

Atlas A2 training products / Atlas A2 inference products : supported

Atlas A3 training products / Atlas A3 inference products : supported

OP_DEBUG_CONFIG

Global memory check switch.

Arguments:

The value is the path of the .cfg configuration file. Multiple options in the configuration file are separated by commas (,).

  • oom: Checks whether memory overwriting occurs in the global memory during operator execution.
    • Configuring this option retains the binary operator file (.o) and operator description file (.json) in the kernel_meta folder under the current execution directory during operator build.
    • If this option is used, the following detection logic is added during operator build. You can use the dump_cce option to view the following code in the generated .cce file:
      inline __aicore__ void  CheckInvalidAccessOfDDR(xxx) {
          if (access_offset < 0 || access_offset + access_extent > ddr_size) {
              if (read_or_write == 1) {
                  trap(0X5A5A0001);
              } else {
                  trap(0X5A5A0002);
              }
          }
      }
  • dump_cce: Retains the operator CCE file (.cce), binary operator file (.o), and operator description file (.json) in the kernel_meta folder under the current execution directory during operator build.
  • dump_loc: Retains the python-CCE mapping file *_loc.json, binary operator file (.o), and operator description file (.json) in the kernel_meta folder under the current execution directory during operator build.
  • ccec_O0: Enables the CCEC option -O0 during operator build. This option does not optimize the debugging information for later analysis of AI Core errors.
  • ccec_g: Enables the CCEC option -g during operator build. This option optimizes the debugging information for later analysis of AI Core errors.
  • check_flag: Checks whether pipeline synchronization signals in operators match each other during operator execution.
    • Configuring this option retains the binary operator file (.o) and operator description file (.json) in the kernel_meta folder under the current execution directory during operator build.
    • If this option is used, the following detection logic is added during operator build. You can use the dump_cce option to view the following code in the generated .cce file:
        set_flag(PIPE_MTE3, PIPE_MTE2, EVENT_ID0);
        set_flag(PIPE_MTE3, PIPE_MTE2, EVENT_ID1);
        set_flag(PIPE_MTE3, PIPE_MTE2, EVENT_ID2);
        set_flag(PIPE_MTE3, PIPE_MTE2, EVENT_ID3);
        ....
        pipe_barrier(PIPE_MTE3);
        pipe_barrier(PIPE_MTE2);
        pipe_barrier(PIPE_M);
        pipe_barrier(PIPE_V);
        pipe_barrier(PIPE_MTE1);
        pipe_barrier(PIPE_ALL);
        wait_flag(PIPE_MTE3, PIPE_MTE2, EVENT_ID0);
        wait_flag(PIPE_MTE3, PIPE_MTE2, EVENT_ID1);
        wait_flag(PIPE_MTE3, PIPE_MTE2, EVENT_ID2);
        wait_flag(PIPE_MTE3, PIPE_MTE2, EVENT_ID3);
        ...

      During actual inference, if the pipeline synchronization signals in operators do not match each other, a timeout error is reported at the faulty operator, and the program is terminated. The following is an example of the error message:

      Aicore kernel execute failed, ..., fault kernel_name=operator name,...
      rtStreamSynchronizeWithTimeout execute failed....

Configuration example:

{ge::ir_option::OP_DEBUG_CONFIG, "/root/test0.cfg"}

The information about the test0.cfg file is as follows:

op_debug_config=ccec_g,oom

Restrictions:

During operator build, if you want to build only some instead of all AI Core operators, you need to add the OP_DEBUG_LIST field to the test0.cfg configuration file. By doing so, only the operators specified in the list are built, based on the options configured in OP_DEBUG_CONFIG. The OP_DEBUG_LIST field has the following requirements:

  • The operator name or operator type can be specified.
  • Operators are separated by commas (,). The operator type is configured in the OpType::typeName format. The operator type and operator name can be configured in a mixed manner.
  • The operator to be compiled must be stored in the configuration file specified by OP_DEBUG_CONFIG.

The following is a configuration example: Add the following information to the test0.cfg file:

op_debug_config=ccec_g,oom
op_debug_list=GatherV2,opType::ReduceSum

During model compilation, the GatherV2,ReduceSum operator is compiled based on the ccec_g and oom options.

NOTE:
  • When ccec compilation options (ccec_O0 and ccec_g) are enabled, the size of the operator kernel file (*.o file) increases. In dynamic shape scenarios, all possible scenarios are traversed during operator compilation, which may cause operator compilation failures due to large operator kernel files. In this case, do not enable the CCEC options.

    If the compilation failure is caused by large operator kernel files, the following log is displayed:

    message:link error ld.lld: error: InputSection too large for range extension thunk ./kernel_meta_xxxxx.o:(xxxx)

  • The ccec_O0 and oom options of the CCEC cannot be both enabled. Otherwise, an AI Core error may be reported. The following is an example of the error message:
    ...there is an aivec error exception, core id is 49, error code = 0x4 ...
  • If the NPU_COLLECT_PATH environment variable is configured, the function of checking whether global memory overwriting occurs cannot be enabled (the configuration file specified by OP_DEBUG_CONFIG is set to oom). Otherwise, an error is reported when the compiled model file or operator kernel package is used.
  • When the build options oom, dump_cce, and dump_loc are configured, if the model contains the following MC2 operators, the *.o, *.json, and *.cce files of the operators are not generated in the operator build folder kernel_meta.

    MatMulAllReduce

    MatMulAllReduceAddRmsNorm

    AllGatherMatMul

    MatMulReduceScatter

    AlltoAllAllGatherBatchMatMul

    BatchMatMulReduceScatterAlltoAll

Applicability:

Atlas inference products : supported

Atlas training products : supported

Atlas 200I/500 A2 inference products : supported

Atlas A2 training products / Atlas A2 inference products : supported

Atlas A3 training products / Atlas A3 inference products : supported

OPTION_SCREEN_PRINT_MODE

Whether to display the graph build process.

Arguments:

  • enable (default): It is displayed.
  • disable: It is not displayed.

Configuration example:

{ge::ir_option::OPTION_SCREEN_PRINT_MODE, "disable"}

Applicability:

Atlas inference products : supported

Atlas training products : supported

Atlas 200I/500 A2 inference products : supported

Atlas A2 training products / Atlas A2 inference products : supported

Atlas A3 training products / Atlas A3 inference products : supported

OPTION_EXPORT_COMPILE_STAT

Whether to generate the fusion_result.json result file of operator fusion information (including graph fusion and UB fusion) during graph build.

This file records the fusion patterns used during graph build. The FUSION_SWITCH_FILE parameter for precision comparison can be used to disable specified fusion patterns. Disabled fusion patterns are not displayed in the fusion_result.json file. In the file:

  • session_and_graph_id_xx_xx: thread and graph ID of the fusion result.
  • graph_fusion: graph fusion.
  • ub_fusion: UB fusion.
  • match_times: number of times that the fusion pattern is matched during graph build.
  • effect_times: actual number of times that the fusion takes effect.
  • repository_hit_times: number of times that the UB fusion repository is hit.

Arguments:

  • 0: The result file of operator fusion information is not generated.
  • 1 (default): The result file of operator fusion information is generated when the program exits normally.
  • 2: The result file of operator fusion information is generated when graph build is complete. If graph build is complete, the result file of operator fusion information is generated even if the program is interrupted in advance.
NOTE:
  • If the ASCEND_WORK_PATH environment variable is not set, the result file is generated in the current path where the script is executed by default. If the ASCEND_WORK_PATH environment variable is set, the result file is saved in $ASCEND_WORK_PATH/FE/${Process ID}/fusion_result.json.
  • The fusion patterns disabled using FUSION_SWITCH_FILE are not displayed in the fusion_result.json file.

Configuration example:

{ge::ir_option::OPTION_EXPORT_COMPILE_STAT, "1"}

Applicability:

Atlas inference products : supported

Atlas A2 training products / Atlas A2 inference products : supported

Atlas A3 training products / Atlas A3 inference products : supported

Atlas training products : supported

Atlas 200I/500 A2 inference products : supported

Precision Tuning

Parameter

Description

PRECISION_MODE

Precision mode of an operator. This parameter cannot be used together with PRECISION_MODE_V2 in the same graph. You are advised to use PRECISION_MODE_V2.

Arguments:

  • force_fp32/cube_fp16in_fp32out:
    force_fp32 and cube_fp16in_fp32out have the same effect. This option indicates that the system selects different processing modes based on the operator type when the operator in the AI Core supports both the float32 and float16 data types. cube_fp16in_fp32out is newly added to the new version. For cube operators, this option has clearer semantics.
    • For cube operators, the system processes the computation based on the operator implementation.
      1. The preferred input data type is float16 and the output data type is float32.
      2. If the float16 input data and float32 output data types are not supported, set both the input and output data types to float32.
      3. If the float32 input and output data types are not supported, set both the input and output data types to float16.
      4. If the float16 input and output data types are not supported, an error is reported.
    • For vector compute operators, the operator precision in the original graph is float16 or bfloat16, and float32 is forcibly selected.

      This option is invalid if the original graph contains operators not supporting float32 in the AI Core, for example, an operator that supports only float16. In this case, float16 is retained. If the operator in the AI Core does not support float32 and it is configured to the blocklist of precision reduction (by setting precision_reduce to false), the counterpart AI CPU operator supporting float32 is used. If the AI CPU operator does not support float32, an error is reported.

  • force_fp16 (default):

    Indicates that float16 is forcibly selected if the operator precision in the original graph is float16, bfloat16, and float32.

  • allow_fp32_to_fp16:
    • For matrix operators:
      • If the operator precision in the original graph is float32, the precision is preferably reduced to float16. If the operator in the AI Core does not support float16, float32 is used. If the operator in the AI Core does not support float32, the AI CPU operator is used for computation. If the AI CPU operator also does not support float32, an error is reported during execution.
      • If the operator precision in the original graph is bfloat16, the precision of the original graph is preferably used. If the operator in the AI Core does not support bfloat16, float32 is used. If the operator in the AI Core does not support float32, the precision is directly reduced to float16. If the operator in the AI Core does not support float16, the AI CPU operator is used for computation. If the AI CPU operator also does not support float16, an error is reported during execution.
    • For vector operators, the precision of the original graph is retained preferably.
      • If the operator precision in the original graph is float32, the precision of the original graph is preferably used. If the operator in the AI Core does not support float32, the precision is directly reduced to float16. If the operator in the AI Core does not support float16, the AI CPU operator is used for computation. If the AI CPU operator also does not support float16, an error is reported during execution.
      • If the operator precision in the original graph is bfloat16, the precision of the original graph is preferably used. If the operator in the AI Core does not support bfloat16, float32 is used. If the operator in the AI Core does not support float32, the precision is directly reduced to float16. If the operator in the AI Core does not support float16, the AI CPU operator is used for computation. If the AI CPU operator also does not support float16, an error is reported during execution.
  • must_keep_origin_dtype:

    Retain the original precision.

    • If the precision of an operator in the original graph is float16, and the implementation of the operator in the AI Core does not support float16 but supports only float32 and bfloat16, the system automatically uses high-precision float32.
    • If the precision of an operator in the original graph is float16, and the implementation of the operator in the AI Core does not support float16 but supports only bfloat16, the AI CPU operator of float16 is used. If the AI CPU operator is not supported, an error is reported.
    • If the precision of an operator in the original graph is float32, and the implementation of the operator in the AI Core does not support float32 but supports only float16, the AI CPU operator of float32 is used. If the AI CPU operator is not supported, an error is reported.
  • allow_mix_precision/allow_mix_precision_fp16:

    allow_mix_precision has the same effect as that of allow_mix_precision_fp16, indicating that mixed precision of float16, bfloat16, and float32 is used for neural network processing. allow_mix_precision_fp16 is newly added to the new version, which has clearer semantics for easy understanding.

    For float32 and befloat16 operators in the original model, float16 is automatically used for certain float32 and bfloat16 operators based on the built-in tuning policy. This will improve system performance and reduce memory usage with minimal precision degradation.

    If this mode is configured, you can view the value of precision_reduce in the built-in tuning policy file of ${INSTALL_DIR}/opp/built-in/op_impl/ai_core/tbe/config/xxx/aic-xxx-ops-info-*.json.

    • If it is set to true, the operator is on the mixed precision trustlist and its precision will be reduced from float32 and bfloat16 to float16.
    • If it is set to false, the operator is on the mixed precision blocklist and its precision will not be reduced from float32 and bfloat16 to float16. In this case, the operator still uses the precision of float32 or bfloat16.
    • If an operator in the network model does not have the precision_reduce option configured, the operator is on the graylist and will follow the same precision processing as the upstream operator.
  • allow_mix_precision_bf16:

    Mixed precision of bfloat16 and float32 is used for neural network processing. In this mode, bfloat16 is automatically used for certain float32 operators on the original model based on the built-in tuning policy. This will improve system performance and reduce memory usage with minimal precision degradation. If the operator in the AI Core does not support bfloat16 and float32, the AI CPU operator is used for computation. If AI CPU operator also does not support bfloat16 and float32, an error is reported during execution.

    If this mode is configured, you can view the value of precision_reduce in the built-in tuning policy file of ${INSTALL_DIR}/opp/built-in/op_impl/ai_core/tbe/config/xxx/aic-xxx-ops-info-*.json.

    • If it is set to true, the operator is on the mixed precision trustlist and its precision will be reduced from float32 to bfloat16.
    • If the field value is false, the operator is on the mixed precision blocklist and its precision will not be reduced from float32 to bfloat16.
    • If an operator in the network model does not have the precision_reduce option configured, the operator is on the graylist and will follow the same precision processing as the upstream operator.
  • allow_fp32_to_bf16:
    • If the operator precision in the original graph is float32, the precision of the original graph is preferably used. If the operator in the AI Core does not support float32, the precision is reduced to bfloat16. If the operator in the AI Core does not support bfloat16, the AI CPU operator is used for computation. If the AI CPU operator also does not support bfloat16, an error is reported during execution.
    • If the operator precision in the original graph is bfloat16, the precision of the original graph is preferably used. If the operator in the AI Core does not support bfloat16, float32 is used. If the operator in the AI Core does not support float32, the AI CPU operator is used for computation. If the AI CPU operator also does not support float32, an error is reported during execution.

Replace ${INSTALL_DIR} with the CANN component directory. For example, if the installation is performed by the root user, the default file storage path is /usr/local/Ascend/cann.

Restrictions:

  • The bfloat16 data type supports only the following products:

    Atlas A2 training products / Atlas A2 inference products

    Atlas A3 training products / Atlas A3 inference products

    Atlas 200I/500 A2 inference products

  • For this option, performance takes priority for the default value and precision overflow issues may occur during subsequent inference. If a precision issue occurs during inference, locate the fault by referring to ""Accuracy Improvement Suggestions for Model Inference"".
  • If you want to avoid precision issues, you can set the option to a value other than the default one. For example, you can set the option to must_keep_origin_dtype.

Configuration example:

{ge::ir_option::PRECISION_MODE, "force_fp16"}

Applicability:

Atlas inference products : supported

Atlas training products : supported

Atlas 200I/500 A2 inference products : supported

Atlas A2 training products / Atlas A2 inference products : supported

Atlas A3 training products / Atlas A3 inference products : supported

PRECISION_MODE_V2

Precision mode of an operator. This parameter cannot be used together with PRECISION_MODE in the same graph. You are advised to use PRECISION_MODE_V2.

Arguments:

  • fp16 (default):

    Indicates that float16 is forcibly selected if the operator precision in the original graph is float16, bfloat16, or float32.

  • origin:

    Retain the original precision.

    • If the precision of an operator in the original graph is float16, and the implementation of the operator in the AI Core does not support float16 but supports only float32 and bfloat16, the system automatically uses high-precision float32.
    • If the precision of an operator in the original graph is float16, and the implementation of the operator in the AI Core does not support float16 but supports only bfloat16, the AI CPU operator of float16 is used. If the AI CPU operator is not supported, an error is reported.
    • If the precision of an operator in the original graph is float32, and the implementation of the operator in the AI Core does not support float32 but supports only float16, the AI CPU operator of float32 is used. If the AI CPU operator is not supported, an error is reported.
  • cube_fp16in_fp32out:
    The system selects a processing mode based on the operator type for AI Core operators supporting both float32 and float16.
    • For cube operators, the system processes the computation based on the operator implementation.
      1. The preferred input data type is float16 and the output data type is float32.
      2. If the float16 input data and float32 output data types are not supported, set both the input and output data types to float32.
      3. If the float32 input and output data types are not supported, set both the input and output data types to float16.
      4. If the float16 input and output data types are not supported, an error is reported.
    • For vector compute operators, the operator precision in the original graph is float16 or bfloat16, and float32 is forcibly selected.

      This option is invalid if the original graph contains operators not supporting float32 in the AI Core, for example, an operator that supports only float16. In this case, float16 is retained. If the operator in the AI Core does not support float32 and it is configured to the blocklist of precision reduction (by setting precision_reduce to false), the counterpart AI CPU operator supporting float32 is used. If the AI CPU operator does not support float32, an error is reported.

  • mixed_float16:

    Mixed precision of float16, bfloat16, and float32 is used for neural network processing. For float32 and befloat16 operators in the original graph, float16 is automatically used for certain float32 and bfloat16 operators based on the built-in tuning policy. This will improve system performance and reduce memory usage with minimal precision degradation.

    If this mode is configured, you can view the value of precision_reduce in the built-in tuning policy file of ${INSTALL_DIR}/opp/built-in/op_impl/ai_core/tbe/config/xxx/aic-xxx-ops-info-*.json.

    • If it is set to true, the operator is on the mixed precision trustlist and its precision will be reduced from float32 and bfloat16 to float16.
    • If it is set to false, the operator is on the mixed precision blocklist and its precision will not be reduced from float32 and bfloat16 to float16. In this case, the operator still uses the precision of float32 or bfloat16.
    • If an operator in the network model does not have the precision_reduce option configured, the operator is on the graylist and will follow the same precision processing as the upstream operator.
  • mixed_bfloat16:

    Mixed precision of bfloat16 and float32 is used for neural network processing. In this mode, bfloat16 is automatically used for certain float32 operators in the original graph based on the built-in tuning policy. This will improve system performance and reduce memory usage with minimal precision degradation. If the operators do not support bfloat16 and float32, the AI CPU operators are used for computation. If AI CPU operators also do not support float16 and float32, an error is reported during execution.

    If this mode is configured, you can view the value of precision_reduce in the built-in tuning policy file of ${INSTALL_DIR}/opp/built-in/op_impl/ai_core/tbe/config/xxx/aic-xxx-ops-info-*.json.

    • If it is set to true, the operator is on the mixed precision trustlist and its precision will be reduced from float32 to bfloat16.
    • If the field value is false, the operator is on the mixed precision blocklist and its precision will not be reduced from float32 to bfloat16.
    • If an operator in the network model does not have the precision_reduce option configured, the operator is on the graylist and will follow the same precision processing as the upstream operator.
  • mixed_hif8:

    Enables automatic mixed precision, indicating that hifloat8 (for details about this data type, see Link), float16, bfloat16, and float32 are used together for neural network processing. In this mode, hifloat8 is automatically used for certain float16, bfloat16, and float32 operators in the original graph based on the built-in tuning policy. This will improve system performance and reduce memory usage with minimal precision degradation. The current version does not support this argument.

    If this mode is configured, you can view the value of precision_reduce in the built-in tuning policy file of ${INSTALL_DIR}/opp/built-in/op_impl/ai_core/tbe/config/xxx/aic-xxx-ops-info-*.json.

    • If it is set to true, the operator is on the mixed precision trustlist and its precision will be reduced from float16, bfloat16, and float32 to hifloat8.
    • If it is set to false, the operator is on the mixed precision blocklist and its precision will not be reduced from float16, bfloat16, and float32 to hifloat8. In this case, the operator still uses the precision of float16, bfloat16, or float32.
    • If an operator in the original graph does not have the precision_reduce option configured, the operator is on the graylist and will follow the same precision processing as the upstream operator.
  • cube_hif8:

    The hifloat8 data type is forcibly used if the Cube operator in the original graph supports both hifloat8 and float16, bfloat16, or float32. The current version does not support this argument.

Replace ${INSTALL_DIR} with the CANN component directory. For example, if the installation is performed by the root user, the default file storage path is /usr/local/Ascend/cann.

Restrictions:

  • The bfloat16 data type supports only the following products:

    Atlas A2 training products / Atlas A2 inference products

    Atlas A3 training products / Atlas A3 inference products

    Atlas 200I/500 A2 inference products

  • For this option, performance takes priority for the default value and precision overflow issues may occur during subsequent inference. If a precision issue occurs during inference, locate the fault by referring to ""Accuracy Improvement Suggestions for Model Inference"".
  • If you want to avoid precision issues, you can set the option to a value other than the default one. For example, you can set the option to origin.

Configuration example:

{ge::ir_option::PRECISION_MODE_V2, "fp16"}

Applicability:

Atlas inference products : supported

Atlas training products : supported

Atlas 200I/500 A2 inference products : supported

Atlas A2 training products / Atlas A2 inference products : supported

Atlas A3 training products / Atlas A3 inference products : supported

MODIFY_MIXLIST

When mixed precision is enabled, you can use this parameter to specify the path and file name of the blocklist, trustlist, and graylist, and specify the operators that allow precision degradation and those that do not allow precision degradation. Set this parameter to the path including the file name. The file is in JSON format. You can view the flag value under precision_reduce in the built-in tuning policy file of ${INSTALL_DIR}/opp/built-in/op_impl/ai_core/tbe/config/xxx/aic-xxx-ops-info-*.json. Replace ${INSTALL_DIR} with the CANN component directory. For example, if the installation is performed by the root user, the default file storage path is /usr/local/Ascend/cann. xxx varies depending on the product.

  • true (trustlist): Precision reduction is allowed in mixed precision mode.
  • false (blocklist): Precision reduction is not allowed in mixed precision mode.
  • Not specified (graylist): Operators on the graylist follow the same precision processing as its upstream operator.

Method for enabling mixed precision:

  • Set PRECISION_MODE to allow_mix_precision, allow_mix_precision_bf16, or allow_mix_precision_fp16.
  • Set PRECISION_MODE_V2 to mixed_float16 or mixed_bfloat16 cannot be configured at the same time. You are advised to use PRECISION_MODE_V2.
Configuration example:
{ge::ir_option::MODIFY_MIXLIST, "/home/test/ops_info.json"}

You can specify the operator types in ops_info.json as follows. Separate operators with commas (,).

{
  "black-list": {                  // Blocklist
     "to-remove": [                // Move an operator from the blocklist to the graylist. Ensure that the specified operator is already on the blocklist.
     "Xlog1py"
     ],
     "to-add": [                   // Move an operator from the trustlist or graylist to the blocklist.
     "Matmul",
     "Cast"
     ]
  },
  "white-list": {                  // Trustlist
     "to-remove": [                // Move an operator from the trustlist to the graylist. Ensure that the specified operator is already on the trustlist.
     "Conv2D"
     ],
     "to-add": [                   // Move an operator from the blocklist or graylist to the trustlist.
     "Bias"
     ]
  }
}

The operators in the preceding example configuration file are for reference only. The configuration should be based on the actual hardware environment and the built-in tuning strategies of the operators. The following is an example of blocklist, trustlist, and graylist query:

"Conv2D":{
    "precision_reduce":{
        "flag":"true"
     }
},

true: trustlist; false: blocklist; Not configured: graylist.

Applicability:

Atlas inference products : supported

Atlas training products : supported

Atlas 200I/500 A2 inference products : supported

Atlas A2 training products / Atlas A2 inference products : supported

Atlas A3 training products / Atlas A3 inference products : supported

Precision Comparison

Parameter

Description

BUFFER_OPTIMIZE

Data buffer optimization switch. Arguments:

  • l1_optimize: Enables L1 optimization. This argument is invalid in the current version and equivalent to off_optimize.
  • l2_optimize (default): Enables L2 optimization.
  • off_optimize: Disables buffer optimization.

Suggestions:

You are advised to enable buffer optimization as this function can improve compute efficiency and performance. However, it is possible that your model contains an operator that is not yet covered by the current implementation, which affects the precision. Therefore, you can disable data buffer optimization when the precision is affected. If the precision meets requirements after buffer optimization is disabled, locate the fishy operator and submit the issue to the technical support for further analysis. After the operator issue is resolved, you are advised to enable buffer optimization.

Configuration example:

{ge::ir_option::BUFFER_OPTIMIZE, "l2_optimize"}

Note: If this parameter is set to l1_optimize, it cannot be used together with VIRTUAL_TYPE. If they are used together, an error is reported, indicating that L1 fusion is not performed in virtualization scenarios. This prevents scheduling exceptions caused by large operators.

Applicability:

Atlas training products : supported

Atlas inference products : supported

Atlas 200I/500 A2 inference products : supported

Atlas A2 training products / Atlas A2 inference products : supported

Atlas A3 training products / Atlas A3 inference products : supported

FUSION_SWITCH_FILE

Directory (including the file name) of the configuration file for the fusion pattern switch. The directory can contain letters (a–z, A–Z), digits (0–9), underscores (_), hyphens (-), and periods (.).

The built-in graph fusion and UB fusion patterns are enabled by default. You can disable specified fusion patterns in the configuration file. Some fusion patterns cannot be disabled due to functionality restrictions. For the full list of fusion patterns that can be disabled, see Graph Fusion and UB Fusion Patterns.

Configuration example:

The following is a template of the fusion_switch.cfg configuration file. on indicates that the setting is enabled, and off indicates that the setting is disabled.

  1. Configuration file example:
    {
        "Switch":{
            "GraphFusion":{
                "RequantFusionPass":"on",
                "ConvToFullyConnectionFusionPass":"off",
                "SoftmaxFusionPass":"on",
                "NotRequantFusionPass":"on",
                "SplitConvConcatFusionPass":"on",
                "ConvConcatFusionPass":"on",
                "MatMulBiasAddFusionPass":"on",
                "PoolingFusionPass":"on",
                "ZConcatv2dFusionPass":"on",
                "ZConcatExt2FusionPass":"on",
                "TfMergeSubFusionPass":"on"
            },
            "UBFusion":{
                "TbePool2dQuantFusionPass":"on"
            }
        }
    }

To disable all fusion patterns at a time, refer to this configuration file example.

  1. Configuration file example:
    {
        "Switch":{
            "GraphFusion":{
                "ALL":"off"
            },
            "UBFusion":{
                "ALL":"off"
             }
        }
    }

Notes:

  1. Some built-in fusion patterns are not switchable due to functionality restrictions and these fusion patterns will remain enabled despite user's switch settings.
  2. To disable all fusion patterns except selected ones, refer to the following example.
    1. Configuration file example:
      {
          "Switch":{
              "GraphFusion":{
                  "ALL":"off",
                  "SoftmaxFusionPass":"on"
              },
              "UBFusion":{
                  "ALL":"off",
                  "TbePool2dQuantFusionPass":"on"
              }
          }
      }

Applicability:

Atlas inference products : supported

Atlas training products : supported

Atlas 200I/500 A2 inference products : supported

Atlas A2 training products / Atlas A2 inference products : supported

Atlas A3 training products / Atlas A3 inference products : supported

Performance Tuning

Parameter

Description

ENABLE_SMALL_CHANNEL

Whether to enable small channel tuning to yield performance benefits at convolutional layers with channel size ≤ 4. You are advised to enable this function in inference scenarios.

Arguments:

  • 0 (default): disabled
  • 1: enabled

Configuration example:

{ge::ir_option::ENABLE_SMALL_CHANNEL, "1"}

Applicability:

Atlas inference products : supported

Atlas training products : supported

Atlas 200I/500 A2 inference products : supported

Atlas A2 training products / Atlas A2 inference products : supported

Atlas A3 training products / Atlas A3 inference products : supported

OPTYPELIST_FOR_IMPLMODE

Operator implementation mode in the optype list.

Restrictions:

  • The operators on the list use the implementation mode specified by OP_SELECT_IMPL_MODE, which is either high_precision or high_performance. Use commas (,) to separate operators.
  • This parameter must be used together with OP_SELECT_IMPL_MODE and takes effect only for specified operators. For other operators, the default implementation mode is used. For example, OP_SELECT_IMPL_MODE is set to high_precision, and OPTYPELIST_FOR_IMPLMODE is set to Pooling or SoftmaxV2. The preceding configuration example indicates that the high-precision mode is used only for the Pooling and SoftmaxV2 operators. For operators whose precision modes are not specified, the default implementation mode is used.

Configuration example:

{ge::ir_option::OPTYPELIST_FOR_IMPLMODE, "Pooling,SoftmaxV2"}

Applicability:

Atlas inference products : supported

Atlas training products : supported

Atlas 200I/500 A2 inference products : supported

Atlas A2 training products / Atlas A2 inference products : supported

Atlas A3 training products / Atlas A3 inference products : supported

TILING_SCHEDULE_OPTIMIZE

Whether to enable the optimization for tiling offload scheduling.

As internal storage of the AI Cores in the NPU cannot store all the input and output data of operators, the input data is tiled into different parts. The first part is transferred in, computed, and then transferred out, so does the next part. This process is called tiling. Then, a computation program, called tiling implementation, determines tiling parameters (such as the block size transferred each time and the total number of cycles) based on operator information such as shape. The AI Core is not good at scalar computation in the tiling implementation. Therefore, tiling implementation is generally executed on the CPU on the host. However, tiling implementation is executed on the device when the following conditions are met:

  1. The model is static-shape.
  2. Operators in the model, such as the FusedInferAttentionScore and IncreFlashAttention fused operators, support tiling offload.
  3. The output values of the operators that support tiling offload have dependencies, that is, the output value of the previous operator contains the execution result of the device. If the value to be depended on is a Const value, tiling offload is not required, and tiling is completed during build.

Arguments:

  • 0 (default): Tiling offload is disabled.
  • 1: Tiling offload is enabled.

Configuration example:

{ge::ir_option::TILING_SCHEDULE_OPTIMIZE, "1"}

Applicability:

Atlas inference products : supported

Atlas A2 training products / Atlas A2 inference products : supported

Atlas A3 training products / Atlas A3 inference products : supported

Atlas training products : not supported

Atlas 200I/500 A2 inference products : not supported

Quantization and Compression

Parameter

Description

ENABLE_COMPRESS_WEIGHT

Whether to enable global weight compression.

AI Core supports weight compression. If the function controlled by this parameter is enabled, the weight data can be compressed. During operator computation, the weight will be extracted to reduce the bandwidth load and improve the performance.

This parameter enables global weight compression. This parameter is mutually exclusive with COMPRESS_WEIGHT_CONF.

Arguments:

  • true: enabled
  • false (default): disabled

Configuration example:

{ge::ir_option::ENABLE_COMPRESS_WEIGHT, "true"}

Applicability:

Atlas training products : not supported

Atlas inference products : not supported

Atlas 200I/500 A2 inference products : not supported

Atlas A2 training products / Atlas A2 inference products : not supported

Atlas A3 training products / Atlas A3 inference products : not supported

COMPRESS_WEIGHT_CONF

Path and name of the configuration file of the nodes to be compressed. The nodes mainly include the conv and fc operators. This parameter is mutually exclusive with ENABLE_COMPRESS_WEIGHT.

Format: The path including the file name allows only letters, digits, and underscores (_). The file name can contain letters, digits, underscores (_), and periods (.).

Restrictions: The weight compression configuration file is generated by AMCT. It is a list of node names separated with semicolons (;). For example, the content of the compress_weight_nodes.cfg file is conv1; fc1; conv2_2/x1; fc2; conv5_32/x2;fc6.

Configuration example:

{ge::ir_option::COMPRESS_WEIGHT_CONF, "$HOME/module/compress_weight_nodes.cfg"}

Applicability:

Atlas training products : not supported

Atlas inference products : not supported

Atlas 200I/500 A2 inference products : not supported

Atlas A2 training products / Atlas A2 inference products : not supported

Atlas A3 training products / Atlas A3 inference products : not supported

SPARSITY

Whether to enable global sparsity.

In the model output by AMCT (Ascend Model Compression Toolkit) after 2:4 structured sparsity, there may be the cases that at least two weight elements in the Cin dimension out of four contiguous ones are forced to zero. You can enable global sparsity during model conversion to filter out two elements to reduce computational demand for inference and optimize inference performance.

Due to hardware restrictions, this parameter cannot be used together with ENABLE_COMPRESS_WEIGHT or COMPRESS_WEIGHT_CONF.

Arguments:

  • 1: Indicates that 2:4 structured sparsity is enabled.
  • 0 (default): Disables the sparsity.

Configuration example:

{ge::ir_option::SPARSITY, "1"}

Restrictions: When using this parameter, ensure that a sparse model is used. You are advised to use the compression combination function of AMCT (TensorFlow) or AMCT (PyTorch). The compression combination requires 2:4 structured sparsity and quantization aware training.

Applicability:

Atlas 200I/500 A2 inference products : supported

Atlas A2 training products / Atlas A2 inference products : supported

Atlas A3 training products / Atlas A3 inference products : supported

Atlas inference products : not supported

Atlas training products : not supported

COMPRESSION_OPTIMIZE_CONF

Path (including the name) of the compression optimization configuration file. This parameter is used to enable the compression optimization function specified in the configuration file to improve network performance. For example, /home/test/compression_optimize.cfg.

An example of the file content configuration is as follows.

enable_first_layer_quantization:true
  • This file supports configuration of only the enable_first_layer_quantization feature, which specifies whether to optimize the convolution at the first layer of the AIPP. (The AIPP is merged with the Quant operator before the CONV2D convolution is performed at the first layer of the model obtained after quantization.)

    When the enable_first_layer_quantization feature is enabled, performance is improved only when the AIPP+CONV2D structure exists in the network structure and ENABLE_SMALL_CHANNEL is set to 1 during model compilation. The precision of the quantized model is compromised to some extent. Therefore, you can determine whether to enable this feature as required.

  • In the configuration file, the name of a compression feature is followed by a value, either true (feature enabled) or false (feature disabled; default). The feature name and the value are separated with a colon (:).

Applicability:

Atlas inference products : supported

Atlas 200I/500 A2 inference products : supported

Atlas training products : not supported

Atlas A2 training products / Atlas A2 inference products : not supported

Atlas A3 training products / Atlas A3 inference products : not supported

Experiment Parameters

Parameter

Description

ALLOW_HF32

This parameter is reserved and is not supported in the current version.

Whether to enable the function of automatically replacing the float32 data type with the HF32 data type. In the current version, this option takes effect only for Conv and Matmul operators.

HF32 is a single-precision floating-point type developed by Ascend for internal computation of operators. The following shows the comparison with other common data types. HF32 shares the value range with float32, but its mantissa precision (11 bits) is close to FP16 (10 bits). Replacing the original float32 single-precision data type with the HF32 single-precision data type by precision reduction can greatly reduce the space occupied by data and improve performance.

Arguments:

  • true: Enable the function of automatically converting the FP32 data type to the HF32 data type for Conv and Matmul operators.

    For details about the operators for which this function is enabled, see opp/built-in/op_impl/ai_core/tbe/impl_mode/allow_hf32_matmul_t_conv_t.ini in the file storage path after the CANN software is installed. This file cannot be modified by users.

  • false: Disable the function of automatically converting the FP32 data type to the HF32 data type for Conv and Matmul operators.

    For details about the operators for which this function is disabled, see opp/built-in/op_impl/ai_core/tbe/impl_mode/allow_hf32_matmul_f_conv_f.ini in the file storage path after the CANN software is installed. This file cannot be modified by users.

Default: Enable FP32-to-HF32 conversion for Conv operators; disable FP32-to-HF32 conversion for Matmul operators.

Restrictions:

  • For the same operator, if enable_hi_float_32_execution or enable_float_32_execution is configured using OP_PRECISION_MODE, you are advised not to use this parameter together with ALLOW_HF32. If they are used together, the priority is as follows:

    OP_PRECISION_MODE(ByNodeName) > ALLOW_HF32 > OP_PRECISION_MODE(ByOpType)

  • ALLOW_HF32 automatically replaces float32 with HF32. To make this option take effect, ensure that the input or output type of the enabled operator is float32. The default value of PRECISION_MODE_V2 is fp16. If the operator type in the original network model is float32, the operator type is forcibly converted to float16. In this case, ALLOW_HF32 does not take effect. You are advised to change the value of PRECISION_MODE_V2 to origin. The default value of PRECISION_MODE is force_fp16, and you are advised to change the value to must_keep_origin_dtype or force_fp32.

Configuration example:

{ge::ir_option::ALLOW_HF32, "true"}

Applicability:

Atlas A2 training products / Atlas A2 inference products : supported

Atlas A3 training products / Atlas A3 inference products : supported

Atlas inference products : not supported

Atlas training products : not supported

Atlas 200I/500 A2 inference products : not supported

OO_LEVEL

Extended option for debugging. It cannot be used in commercial products and will be released as a formal function in later versions.

Multi-level optimization options for graph build include subgraph optimization, entire graph optimization, and static shape model offloading.

Static shape model offloading: In this approach, the input and output shapes of all operators in a static shape model can be determined at build time, allowing for model-level memory orchestration and operator tiling computation to be completed on the host. These computations are then batched and sent to the device stream when the model is loaded, but they are not executed immediately. Instead, the execution of all tasks within the model is triggered by the delivery of model execution tasks.

Arguments:

  • O1: Disables all graph fusion and UB fusion passes, and performs only optimizations related to static offloading, such as InferShape (output tensor shape inference), constant folding, dead-edge elimination, and other optimizations.
  • O3 (default): Enable s all optimizations.

Restrictions:

If the value is O1, all graph fusion and UB fusion passes are disabled, and only passes related to static offloading are enabled. However, the graph fusion passes in the following files are enabled by default because function problems may occur if they are disabled:

All graph fusion passes under the ExceptionalPassOfO1Level field in the ${INSTALL_DIR}/<arch>-linux/lib64/plugin/opskernel/fusion_pass/config/fusion_config.json file

Replace ${INSTALL_DIR} with the CANN component directory. For example, if the installation is performed by the root user, the default file storage path is /usr/local/Ascend/cann.<arch> indicates the OS architecture.

Configuration example:

{ge::ir_option::OO_LEVEL, "O3"}

Applicability:

Atlas inference products : supported

Atlas training products : supported

Atlas 200I/500 A2 inference products : supported

Atlas A2 training products / Atlas A2 inference products : supported

Atlas A3 training products / Atlas A3 inference products : supported

OO_CONSTANT_FOLDING

Extended option for debugging. It cannot be used in commercial products and will be released as a formal function in later versions.

Whether to enable constant folding optimization.

Constant folding is the process of replacing nodes that can be evaluated to a constant output value in a computational graph with that constant, and simplifying the structure of the computational graph accordingly.

Arguments:

  • true (default): enabled
  • false: disabled

Configuration example:

{ge::ir_option::OO_CONSTANT_FOLDING, "true"}

Applicability:

Atlas inference products : supported

Atlas training products : supported

Atlas 200I/500 A2 inference products : supported

Atlas A2 training products / Atlas A2 inference products : supported

Atlas A3 training products / Atlas A3 inference products : supported

OO_DEAD_CODE_ELIMINATION

Extended option for debugging. It cannot be used in commercial products and will be released as a formal function in later versions.

Whether to enable dead-edge elimination optimization.

Dead-edge elimination (switch dead-edge elimination): When pred (input 1) of a switch statement is a constant node, one of the branches can be eliminated based on the value of const. If const is true, the false branch is eliminated; if const is false, the true branch is eliminated.

Arguments:

  • true (default): enabled
  • false: disabled

Configuration example:

{ge::ir_option::OO_DEAD_CODE_ELIMINATION, "true"}

Applicability:

Atlas inference products : supported

Atlas training products : supported

Atlas 200I/500 A2 inference products : supported

Atlas A2 training products / Atlas A2 inference products : supported

Atlas A3 training products / Atlas A3 inference products : supported

TUNE_DEVICE_IDS

Not supported in the current version.

Parameters That Will Be Deprecated in Later Versions

Parameter

Description

OP_SELECT_IMPL_MODE

Operator implementation mode. Certain operators built in the Ascend AI Processor can be implemented in either high-precision or high-performance mode at model build time.

In high-precision mode, Taylor's theorem or Newton's method is used to improve operator precision with float16 input. In high-performance mode, the optimal performance is implemented without affecting the network precision (float16).

Arguments:

  • high_precision: high-precision mode.

    This option sets the operator implementation mode by using the built-in configuration file, which is stored in ${INSTALL_DIR}/opp/op_impl/built-in/ai_core/tbe/impl_mode/high_precision.ini.

    To ensure compatibility, this argument takes effect only for the operator list in the high_precision.ini file. This list can be used to control the effective scope of operators and ensure that the network models of earlier versions are not affected.

  • high_performance (default): high-performance mode.

    This option sets the operator implementation mode by using the built-in configuration file, which is stored in ${INSTALL_DIR}/opp/built-in/op_impl/ai_core/tbe/impl_mode/high_performance.ini.

    To ensure compatibility, this argument takes effect only for the operator list in the high_performance.ini file. This list can be used to control the effective scope of operators and ensure that the network models of earlier versions are not affected.

  • high_precision_for_all: high-precision mode.

    This option sets the operator implementation mode by using the built-in configuration file, which is stored in ${INSTALL_DIR}/opp/built-in/op_impl/ai_core/tbe/impl_mode/high_precision_for_all.ini. The list in this file may be updated with the version.

    This implementation mode may cause incompatibility. If an operator in the new software package sets the implementation mode (that is, an implementation mode is added for a certain operator in the configuration file), the performance of the earlier network model that uses the high_precision_for_all mode may deteriorate.

  • high_performance_for_all: high-performance mode.

    This option sets the operator implementation mode by using the built-in configuration file, which is stored in ${INSTALL_DIR}/opp/built-in/op_impl/ai_core/tbe/impl_mode/high_performance_for_all.ini. The list in this file may be updated with the version.

    This implementation mode may cause incompatibility. If an operator in the new software package sets the implementation mode (that is, an implementation mode is added for a certain operator in the configuration file), the precision of the earlier network model that uses the high_performance_for_all mode may deteriorate.

The preceding implementation modes are distinguished based on dtype of the operator. Replace ${INSTALL_DIR} with the CANN component directory. For example, if the installation is performed by the root user, the default file storage path is /usr/local/Ascend/cann.

Default: high_performance

Configuration example:

{ge::ir_option::OP_SELECT_IMPL_MODE, "high_performance"}

Applicability:

Atlas inference products : supported

Atlas training products : supported

Atlas 200I/500 A2 inference products : supported

Atlas A2 training products / Atlas A2 inference products : supported

Atlas A3 training products / Atlas A3 inference products : supported