aclgrphBuildModel Configuration Parameters

Basic Functions

Parameter

Description

INPUT_FORMAT

Input data format.

Arguments:

The NCHW, NHWC, ND, NCDHW, and NDHWC formats are supported.

Configuration example:

{ge::ir_option::INPUT_FORMAT, "NHWC"}

To enable AIPP during inference, the input graph data must be in NHWC format. In this scenario, the data format specified by INPUT_FORMAT does not take effect.

NOTE:

This parameter applies only to the dynamic batch size, dynamic image size, and dynamic dimension scenarios.

In these scenarios, INPUT_FORMAT must be the same as the format of all data operators. Otherwise, model build fails.

Applicability:

Atlas inference products : supported

Atlas training products : supported

Atlas 200I/500 A2 inference products : supported

Atlas A2 training products / Atlas A2 inference products : supported

Atlas A3 training products / Atlas A3 inference products : supported

OP_NAME_MAP

Directory (including the file name) of the mapping configuration file for a custom operator (not standard operator). The function of a custom operator varies according to the network. You can specify the mapping between the custom operator and the actual custom operator running on the network.

The path (including the file name) can contain letters (a–z, A–Z), digits (0–9), underscores (_), hyphens (-), and periods (.).

Configuration example:

{ge::ir_option::OP_NAME_MAP, "/home/test/opname_map.cfg"}

The following is an example of the content in the mapping configuration file of a custom operator:

OpA:Network1OpA

Applicability:

Atlas inference products : supported

Atlas training products : supported

Atlas 200I/500 A2 inference products : supported

Atlas A2 training products / Atlas A2 inference products : supported

Atlas A3 training products / Atlas A3 inference products : supported

INSERT_OP_FILE

Configuration file of the preprocessing operator, for example, AIPP operator For details about how to use this parameter, see Special Topics > AIPP.

This parameter is mutually exclusive with INPUT_FP16_NODES.

The configuration file path can contain only letters, digits, and underscores (_). The file name can contain letters, digits, underscores (_), and periods (.).

Configuration example:

{ge::ir_option::INSERT_OP_FILE, "/home/test/aipp.cfg"}

The following is an example of the configuration file:

aipp_op {
aipp_mode:static
input_format:YUV420SP_U8
csc_switch:true
var_reci_chn_0:0.00392157
var_reci_chn_1:0.00392157
var_reci_chn_2:0.00392157
}
NOTE:

For details about the configuration file, see ATC Instructions.

Applicability:

Atlas inference products : supported

Atlas training products : supported

Atlas 200I/500 A2 inference products : supported

Atlas A2 training products / Atlas A2 inference products : supported

Atlas A3 training products / Atlas A3 inference products : supported

OUTPUT_TYPE

Network output data type.

Arguments:

  • FP32: recommended for classification and object detection networks
  • UINT8: recommended for image super-resolution networks for better inference performance
  • FP16: recommended for classification and object detection networks. It is usually used when the output of one network is used as the input of another.
  • INT8

After the model compilation is complete, the preceding data types are displayed as DT_FLOAT, DT_UINT8, DT_FLOAT16, or DT_INT8 in the corresponding *.om model file.

Configuration example:

{ge::ir_option::OUTPUT_TYPE, "FP32"}

Restrictions:

  • If no data type is specified, the data type of the operator output at the output layer of the original network model applies.
  • If the data type is specified, the type specified by this parameter is used.

Applicability:

Atlas inference products : supported

Atlas training products : supported

Atlas 200I/500 A2 inference products : supported

Atlas A2 training products / Atlas A2 inference products : supported

Atlas A3 training products / Atlas A3 inference products : supported

INPUT_FP16_NODES

Names of the input nodes of the FP16 type.

For example, the format is "node_name1;node_name2". Enclose the specified nodes in double quotation marks ("") and separate the nodes with semicolons (;). This parameter is mutually exclusive with INSERT_OP_FILE.

Configuration example:

{ge::ir_option::INPUT_FP16_NODES, "node_name1;node_name2"}

Applicability:

Atlas inference products : supported

Atlas training products : supported

Atlas 200I/500 A2 inference products : supported

Atlas A2 training products / Atlas A2 inference products : supported

Atlas A3 training products / Atlas A3 inference products : supported

Memory Management

Parameter

Description

EXEC_DISABLE_REUSED_MEMORY

Memory reuse switch.

Memory reuse refers to the practice of repeatedly utilizing non-conflicting memory based on its lifecycle and size, thereby reducing network memory consumption.

Arguments:

  • 0 (default): Enables memory reuse.
  • 1: disabled If the network model is large, disabling memory reuse will cause the device memory not to be reused during subsequent inference, resulting in insufficient memory.

Configuration example:

{ge::ir_option::EXEC_DISABLE_REUSED_MEMORY, "0"}

Applicability:

Atlas inference products : supported

Atlas training products : supported

Atlas 200I/500 A2 inference products : supported

Atlas A2 training products / Atlas A2 inference products : supported

Atlas A3 training products / Atlas A3 inference products : supported

EXTERNAL_WEIGHT

Whether to externalize the weights of the Const/Constant nodes on the original network and convert the node type to FileConstant when the OM model file is generated.

In the offline scenario, if the model weight is large and the environment has restrictions on the OM offline model file size, you are advised to enable the external weight and save the weight separately to reduce the OM file size.

Arguments:

  • 0 (default): The weights are not externalized and are directly saved in the OM offline model file.
  • 1: The weights are externalized. The weight files of all Const/Constant nodes on the network are flushed to the disk, and the node type is converted to FileConstant. The weight files are saved in the weight directory at the same level as the OM file. Weights of different nodes are stored in different files, which are named in the format of weight_<hash value>.

Configuration example:

{ge::ir_option::EXTERNAL_WEIGHT, "1"}

Restrictions:

  • In the external weight scenario, when acl APIs are used to develop inference applications and load models:
    • Use the aclgrphSaveModel API to save the OM model.
      • If aclmdlLoadFromFile is used to load a model, the weight file must be stored in the weight directory at the same level as the OM file.
      • If aclmdlSetConfigOpt and aclmdlLoadWithConfig are used to load a model, there is no requirement on the external weight directory. When the model is loaded later, use aclmdlLoadWithConfig to specify the external weight directory.
    • In the weight update scenario, use aclgrphBundleSaveModel to save the OM model.

      Only aclmdlBundleLoadFromFile can be used to load a model, and the weight file must be stored in the weight directory at the same level as the OM file.

    For details about the APIs, see ""Model Loading and Unloading"".

Applicability:

Atlas inference products : supported

Atlas training products : supported

Atlas 200I/500 A2 inference products : supported

Atlas A2 training products / Atlas A2 inference products : supported

Atlas A3 training products / Atlas A3 inference products : supported

Dynamic Shape

Parameter

Description

INPUT_SHAPE

Input shape of a model.

Arguments:

  • If the model uses a static shape, INPUT_SHAPE is optional.
    • If the model uses a single input, the shape information is "input_name:n,c,h,w".
    • If the model uses multiple inputs, the shape information is "input_name1:n1,c1,h1,w1;input_name2:n2,c2,h2,w2". Different inputs are separated by semicolons (;). input_name must be the name of a node in the network model before conversion.
  • If the model uses a non-static shape, INPUT_SHAPE must be configured.
    If one or more dimension values of the input data in the original model are not fixed, the model can be converted by setting the shape profile or shape range.
    • Setting the shape profile (static shape): The shape profiles include the batch size, image size, and specified dynamic dimension profiles.

      When setting the INPUT_SHAPE parameter, set the corresponding dimension value to -1 and use DYNAMIC_BATCH_SIZE (setting batch size profiles), DYNAMIC_IMAGE_SIZE (setting image size profiles), or DYNAMIC_DIMS (setting dynamic dimension profiles). For details, see the parameter description of DYNAMIC_BATCH_SIZE, DYNAMIC_IMAGE_SIZE, and DYNAMIC_DIMS.

    • Setting the shape range (dynamic shape). The shape range cannot be set for the Atlas 200I/500 A2 inference products .

      When setting INPUT_SHAPE, you can define the corresponding dimension with a range of valid values, for example, 1~10.

      • To set the shape range based on node names, the format is "input_name1:n1,c1,h1,w1;input_name2:n2,c2,h2,w2", for example, "input_name1:8~20,3,5,-1;input_name2:5,3~9,10,-1". Enclose the specified nodes in double quotation marks (""), and separate them by semicolons (;). input_name must be the node name in the network model before conversion. As a best practice, you should set the parameter based on data node names.
      • To set the shape range based on node indexes, the format is "n1,c1,h1,w1;n2,c2,h2,w2", for example, "8~20,3,5,-1;5,3~9,10,-1". If the node name is not specified, the nodes are sorted by the index and separated by semicolons (;). When the shape range is specified based on the index, the index attribute must be set sequentially from 0 for data nodes.

      If you do not want to specify the dimension range or value, you can set it to -1, indicating that the dimension can be any value greater than or equal to 0. In this scenario, the upper limit of the value is the int64 type range. However, the value is limited by the size of the physical memory on the host and device, so you can increase the memory size to support it.

  • Scalar model shape:
    • Non-dynamic profile scenario:

      Shape is a scalar input, which is optional. For example, if the model has two inputs—input_name1 is a scalar with shape in the "[]" format, and input_name2 has the shape of [n2,c2,h2,w2], then the shape information of the model is "input_name1:;input_name2:n2,c2,h2,w2". Different inputs are separated by semicolons (;). input_name must be the node name in the network model used before conversion. If the scalar input needs to be configured, leave it empty.

    • Dynamic profile scenario:

      If the model input has both scalar shape and dynamic-profile shape, the scalar input must be configured. For example, if a model has three inputs: A:[-1,c1,h1,w1], B:[], and C:[n2,c2,h2,w2], the shape information is "A:-1,c1,h1,w1;B:;C:n2,c2,h2,w2". Scalar input B must be configured.

Configuration example:

  • Static shape. For example, if the input shape information of a network consists of input 1 (input_0_0 [16,32,208,208]) and input 2 (input_1_0 [16,64,208,208]), the configuration of INPUT_SHAPE is as follows:
    {ge::ir_option::INPUT_SHAPE, "input_0_0:16,32,208,208;input_1_0:16,64,208,208"}
  • Non-static shape, static shape:
    • For details about how to set the batch size, see DYNAMIC_BATCH_SIZE.
    • For details about how to set the image size, see DYNAMIC_IMAGE_SIZE.
    • For details about how to set profiles for a specified dimension, see DYNAMIC_DIMS.
  • Non-static shape, dynamic shape (shape range example):
    {ge::ir_option::INPUT_SHAPE, "input_0_0:1~10,32,208,208;input_1_0:16,64,100~208,100~208"}

    For details about the examples and precautions, see Special Topics > Shape Range of Dynamic Input.

  • Scalar shape
    • Non-dynamic profile scenario:

      Shape is a scalar input, which is optional. For example, if the model has two inputs—input_name1 is a scalar and input_name2 has the shape of [16,32,208,208], the configuration example is as follows:

      {ge::ir_option::INPUT_SHAPE, "input_name1:;input_name2:16,32,208,208"}

      In the preceding example, input_name1 is optional.

    • Dynamic profile scenario:

      Shape is a scalar input, which is mandatory. For example, if the model has three inputs and the shape information is A:[-1,32,208,208], B:[], and C:[16,64,208,208], the configuration example is as follows (A is the dynamic profile input, and the batch size profile is used):

      {ge::ir_option::INPUT_SHAPE, "A:-1,32,208,208;B:;C:16,64,208,208"}, 
      {ge::ir_option::DYNAMIC_BATCH_SIZE, "1,2,4"} 
NOTE:

If this parameter is used to set the shape range during model conversion:

When using an application project for model inference, call aclmdlSetDatasetTensorDesc before calling the model execution APIs to set the actual input tensor description (input shape range). After the model is executed, call aclmdlGetDatasetTensorDesc to obtain the tensor description of the dynamic output of the model. Then, call the operation APIs under aclTensorDesc to obtain the memory size occupied by the output tensor data, tensor format, and tensor dimensions.

Applicability:

Atlas inference products : supported

Atlas training products : supported

Atlas 200I/500 A2 inference products : supported

Atlas A2 training products / Atlas A2 inference products : supported

Atlas A3 training products / Atlas A3 inference products : supported

DYNAMIC_BATCH_SIZE

Dynamic batch size profile. This parameter applies to the scenario where the number of images processed per inference batch is unfixed.

This parameter must be used together with INPUT_SHAPE and is mutually exclusive with DYNAMIC_IMAGE_SIZE or DYNAMIC_DIMS. In addition, N must be in the first place of the shape, that is, the first place of the shape must be set to -1. If N is not in the first place, use DYNAMIC_DIMS to set it.

Argument: batch size profiles, for example, "1,2,4,8".

Format: Enclose the whole argument in double quotation marks (""), and separate the batch sizes with commas (,).

Restrictions:

  • For the following products, the batch size profile range is (1,100]. That is, at least two profiles must be set and a maximum of 100 profiles are supported. The recommended value range for each profile is [1–2048].

    Atlas A3 training products / Atlas A3 inference products

    Atlas A2 training products / Atlas A2 inference products

    Atlas 200I/500 A2 inference products

    Atlas inference products

    Atlas training products

Configuration example:

The value -1 of INPUT_SHAPE indicates that the dynamic batch size is enabled.

{ge::ir_option::INPUT_FORMAT, "NHWC"}
{ge::ir_option::INPUT_SHAPE, "data:-1,3,416,416"}, 
{ge::ir_option::DYNAMIC_BATCH_SIZE, "1,2,4,8"}     

For details about the examples and precautions, see Special Topics > Dynamic BatchSize.

Applicability:

Atlas A3 training products / Atlas A3 inference products : supported

Atlas A2 training products / Atlas A2 inference products : supported

Atlas inference products : supported

Atlas training products : supported

Atlas 200I/500 A2 inference products : supported

DYNAMIC_IMAGE_SIZE

Dynamic input image size. This parameter applies to the scenario where the resolution of images input for inference is not fixed.

This parameter must be used together with INPUT_SHAPE and is mutually exclusive with DYNAMIC_BATCH_SIZE and DYNAMIC_DIMS.

Argument: "imagesize1_height,imagesize1_width;imagesize2_height,imagesize2_width"

Format: Enclose the whole argument in double quotation marks (""), and separate profiles by semicolons (;) and values within each profile by commas (,).

Restrictions:

  • For the following products, the profile range is (1,100]. That is, at least two profiles must be set, and a maximum of 100 profiles are supported.

    Atlas A3 training products / Atlas A3 inference products

    Atlas A2 training products / Atlas A2 inference products

    Atlas 200I/500 A2 inference products

    Atlas inference products

    Atlas training products

Configuration example:

The value -1 of INPUT_SHAPE indicates that the dynamic image size is enabled.

{ge::ir_option::INPUT_FORMAT, "NCHW"}, 
{ge::ir_option::INPUT_SHAPE, "data:8,3,-1,-1"}, 
{ge::ir_option::DYNAMIC_IMAGE_SIZE, "416,416;832,832"}

For details about the examples and precautions, see Special Topics > Dynamic Image Size.

Applicability:

Atlas A3 training products / Atlas A3 inference products : supported

Atlas A2 training products / Atlas A2 inference products : supported

Atlas inference products : supported

Atlas training products : supported

Atlas 200I/500 A2 inference products : supported

DYNAMIC_DIMS

Dynamic dimension profile in ND format. This parameter applies to the scenario where the dimensions for inference are unfixed.

This parameter must be used together with INPUT_SHAPE and is mutually exclusive with DYNAMIC_BATCH_SIZE and DYNAMIC_IMAGE_SIZE.

Argument: formatted as "dim1,dim2,dim3;dim4,dim5,dim6;dim7,dim8,dim9"

Format: Enclose the whole argument in double quotation marks (""), separate the profiles by semicolons (;), and separate values within each profile by commas (,). The dimension size values match the -1 placeholders in the INPUT_SHAPE parameter with ordering preserved, and the number of -1 placeholders equals the number of dimension sizes of each profile.

Restrictions:

  • For the following products, the profile range is (1,100]. That is, at least two profiles must be set, and a maximum of 100 profiles are supported. Three to four profiles are recommended.

    Atlas A3 training products / Atlas A3 inference products

    Atlas A2 training products / Atlas A2 inference products

    Atlas 200I/500 A2 inference products

    Atlas inference products

    Atlas training products

Configuration example:

{ge::ir_option::INPUT_FORMAT, "ND"},
{ge::ir_option::INPUT_SHAPE, "data:1,-1"}, 
{ge::ir_option::DYNAMIC_DIMS, "4;8;16;64"}  
// At model build time, the supported shape of the Data operator is 1,4; 1,8; 1,16;1,64.
{ge::ir_option::INPUT_FORMAT, "ND"},
{ge::ir_option::INPUT_SHAPE, "data:1,-1,-1"}, 
{ge::ir_option::DYNAMIC_DIMS, "1,2;3,4;5,6;7,8"}  
// At model build time, the supported shape of the Data operator is 1,1,2; 1,3,4; 1,5,6; 1,7,8.

For details about the examples and precautions, see Special Topics > Dynamic Dimensions.

Applicability:

Atlas A3 training products / Atlas A3 inference products : supported

Atlas A2 training products / Atlas A2 inference products : supported

Atlas inference products : supported

Atlas training products : supported

Atlas 200I/500 A2 inference products : supported

AC_PARALLEL_ENABLE

Whether to allow AI CPU operators and AI Core operators to run in parallel in a dynamic shape graph.

In a dynamic shape graph, when this function is enabled, the system automatically identifies AI CPU operators that can be run in parallel with the AI Core operators in the graph. Operators of different engines are distributed to different streams to run in parallel, improving resource utilization and dynamic shape execution performance.

Arguments:

  • 1: AI CPU operators and AI Core operators are allowed to run in parallel.
  • 0 (default): AI CPU operators are not separately distributed.

Configuration example:

{ge::ir_option::AC_PARALLEL_ENABLE, "1"}

Applicability:

Atlas inference products : supported

Atlas training products : supported

Atlas A2 training products / Atlas A2 inference products : supported

Atlas A3 training products / Atlas A3 inference products : supported

Atlas 200I/500 A2 inference products : not supported

Operator and Graph Build

Parameter

Description

EXCLUDE_ENGINES

Disabling of one or more acceleration engines during graph build. Use vertical bars (|) to separate multiple engines.

The NPU integrates multiple hardware accelerators (also called acceleration engines), such as AiCore/AiVec/AiCpu (sorted by priority). During graph build, an appropriate engine is selected for an operator based on the priority. Specifically, when an operator is supported by multiple engines, the one with a higher priority is selected.

EXCLUDE_ENGINES can exclude engines for operators. For example, during a training process, to prevent the data preprocessing graph and the main training graph from preempting AiCore, you can configure this parameter to prevent the data preprocessing graph from using the AiCore engine.

Arguments:

  • AiCore: AI Core hardware acceleration engine
  • AiVec: Vector Core hardware acceleration engine
  • AiCpu: AI CPU hardware acceleration engine

Configuration example:

{ge::ir_option::EXCLUDE_ENGINES, "AiCore|AiVec"}

Applicability:

Atlas inference products : supported

Atlas training products : supported

Atlas 200I/500 A2 inference products : supported

Atlas A2 training products / Atlas A2 inference products : supported

Atlas A3 training products / Atlas A3 inference products : supported

OP_COMPILER_CACHE_MODE

Disk cache mode for operator compilation.

Arguments:

  • enable (default): enabled. If it is enabled, operators with the same compilation configurations and operator configurations will not be built repeatedly, thus accelerating the compilation speed.
  • force: enabled with cache forcibly refreshed. That is, the existing cache is cleared up before the operator is recompiled and added to the cache. For example, for Python changes, dependency library changes, or repository changes after operator optimization, you need to set this option to force to clear up the existing cache and then change it to enable to prevent the cache from being forcibly refreshed during each build.
  • disable: disabled.

Configuration example:

{ge::ir_option::OP_COMPILER_CACHE_MODE, "enable"}

Instructions:

  • To specify the disk cache path for operator compilation, use this parameter together with OP_COMPILER_CACHE_DIR.
  • When you enable the operator compilation cache function, set the disk space of the cache folder with the configuration file (the op_cache.ini file automatically generated in the path specified by OP_COMPILER_CACHE_DIR after operator build) or environment variables.
    1. Using the op_cache.ini configuration file:

      If the op_cache.ini file does not exist, manually create it. Open the file and add the following information:

      # Configure the file format (required). The automatically generated file contains the following information by default. When manually creating a file, enter the following information:
      [op_compiler_cache]
      # Limit the disk space of the cache folder on a chip, in MB. The default value is 500. The value must be an integer.
      max_op_cache_size=500
      # Set the ratio of the cache size to be reserved, in percentage. The value range is [1, 100]. The default value is 50. For example, 80 indicates that when the cache space is insufficient, 80% of the cache space is reserved and the rest is cleared up.
      remain_cache_size_ratio=50    
      • The op_cache.ini file takes effect only when the values of max_op_cache_size and remain_cache_size_ratio in the preceding file are valid.
      • If the size of the build cache file exceeds the value of max_op_cache_size and the cache file is not accessed for more than half an hour, the cache file will be aged. (Operator build will not be interrupted due to the size of the build cache file exceeding the set limit. Therefore, if max_op_cache_size is set to a small value, the size of the actual build cache file may exceed the configured value.)
      • To disable the build cache aging function, set max_op_cache_size to -1. In this case, the access time is not updated when the operator cache is accessed, the operator build cache is not aged, and the default disk space of 500 MB is used.
      • If multiple users use the same cache path, you are advised to use the configuration file to set the cache path. In this scenario, the op_cache.ini file affects all users.
    2. Using environment variables

      In this scenario, the environment variable ASCEND_MAX_OP_CACHE_SIZE is used to limit the storage space of the cache folder of a chip. When the build cache space reaches the specified value and the cache file is not accessed for more than half an hour, the cache file is aged. The environment variable ASCEND_REMAIN_CACHE_SIZE_RATIO is used to set the ratio of the cache space to be reserved.

      A configuration example is as follows:

      # The ASCEND_MAX_OP_CACHE_SIZE environment variable defaults to 500, in MB. The value must be an integer.
      export ASCEND_MAX_OP_CACHE_SIZE=500
      # The value range of the ASCEND_REMAIN_CACHE_SIZE_RATIO environment variable is [1, 100]. The default value is 50, in percentage. For example, 80 indicates that when the cache space is insufficient, 80% of the cache space is reserved and the rest is cleared up.
      export ASCEND_REMAIN_CACHE_SIZE_RATIO=50
      • The argument configured through environment variables takes effect only for the current user.
      • To disable the build cache aging function, set the environment variable ASCEND_MAX_OP_CACHE_SIZE to -1. In this case, the access time is not updated when the operator cache is accessed, the operator build cache is not aged, and the default disk space of 500 MB is used.

    If both the op_cache.ini file and the environment variable are configured, the configuration items in the op_cache.ini file are read first. If neither the op_cache.ini file nor the environment variables are configured, the system's default values (500 MB disk space and 50% of reserved cache space) are read.

  • If this parameter is set to force, the existing cache will be cleared. Therefore, it is not recommended for parallel program compilation. Otherwise, the cache used by other models may be cleared, causing compilation failures.
  • disable and force are recommended for publishing the final model.
  • If the repository changes after operator tuning, set this parameter to force to refresh the cache. Otherwise, the new tuning repository cannot be applied, and the tuning application fails to be executed.
  • When the debugging function is enabled:
    • If OP_DEBUG_LEVEL is set to a non-zero value, the OP_COMPILER_CACHE_MODE parameter configuration does not take effect, the operator compilation cache function is disabled, and all operators are recompiled.
    • If OP_DEBUG_CONFIG is not empty and OP_DEBUG_LIST is not configured, the OP_COMPILER_CACHE_MODE parameter configuration does not take effect, the operator compilation cache function is disabled, and all operators are recompiled.
    • If OP_DEBUG_CONFIG is not empty and OP_DEBUG_LIST is configured in the configuration file:
      • For operators in the list, ignore the configuration of OP_COMPILER_CACHE_MODE and continue to recompile them.
      • For operators out of the list, if OP_COMPILER_CACHE_MODE is set to enable or force, the cache function is enabled. If OP_COMPILER_CACHE_MODE is set to disable, the cache function is disabled and the operators are recompiled.

Applicability:

Atlas inference products : supported

Atlas training products : supported

Atlas 200I/500 A2 inference products : supported

Atlas A2 training products / Atlas A2 inference products : supported

Atlas A3 training products / Atlas A3 inference products : supported

OP_COMPILER_CACHE_DIR

Disk cache directory for operator compilation.

Format: The directory can contain letters, digits, underscores (_), hyphens (-), and periods (.).

Default value: $HOME/atc_data

Configuration example:

{ge::ir_option::OP_COMPILER_CACHE_MODE, "enable"}
{ge::ir_option::OP_COMPILER_CACHE_DIR, "/home/test/data/atc_data"}

Restrictions:

  • To specify the disk cache path for operator compilation, use this option together with OP_COMPILER_CACHE_MODE.
  • If the specified directory exists and is valid, a kernel_cache subdirectory is automatically created. If the specified directory does not exist but is valid, the system automatically creates this directory and the kernel_cache subdirectory.
  • Do not store other self-owned content in the default cache directory. The self-owned content will be deleted together with the default cache directory during software package installation or upgrade.
  • The non-default cache directory specified by this option cannot be deleted. The directory will not be deleted during software package installation or upgrade.
  • In addition to OP_COMPILER_CACHE_DIR, the environment variable ASCEND_CACHE_PATH can be used to set the disk cache directory for operator build. The priorities of the configuration methods are as follows: OP_COMPILER_CACHE_DIR > ASCEND_CACHE_PATH > default cache directory.

Applicability:

Atlas inference products : supported

Atlas training products : supported

Atlas 200I/500 A2 inference products : supported

Atlas A2 training products / Atlas A2 inference products : supported

Atlas A3 training products / Atlas A3 inference products : supported

OPTIMIZATION_SWITCH

Fusion pattern (pass) control switch used during operator build. This parameter applies to all fusion patterns.

Argument: Passname1:on;Passname2:off. Multiple key-value pairs can be concatenated. key is the pass name, and value can be set to on (enabled) or off (disabled). Case-sensitive matching is not supported. Multiple groups of configurations are separated by semicolons (;). For details about the fusion patterns that can be configured, see Fusion Pattern List.

Configuration example:

{ge::ir_option::OPTIMIZATION_SWITCH, "Passname1:on;Passname2:off"}

Applicability:

Atlas inference products : supported

Atlas training products : supported

Atlas 200I/500 A2 inference products : supported

Atlas A2 training products / Atlas A2 inference products : supported

Atlas A3 training products / Atlas A3 inference products : supported

Debugging

Parameter

Description

LOG_LEVEL

Log level.

Arguments:

  • debug: debug, info, warning, and error logs
  • info: info, warning, and error logs
  • warning: warning and error logs
  • error: Debug logs of the error level are generated.
  • null (default): No debug log is generated.

Configuration example:

{ge::ir_option::LOG_LEVEL, "debug"}

Applicability:

Atlas inference products : supported

Atlas training products : supported

Atlas 200I/500 A2 inference products : supported

Atlas A2 training products / Atlas A2 inference products : supported

Atlas A3 training products / Atlas A3 inference products : supported

DEBUG_DIR

Directory of the debug-related process files generated during operator build, including the .o (operator binary file), .json (operator description file), and .cce files.

By default, the files are generated in the current directory.

Restrictions:

  • If you want to specify the path for storing the process file of operator compilation, use DEBUG_DIR and OP_DEBUG_LEVEL together. If OP_DEBUG_LEVEL is set to 0, DEBUG_DIR cannot be used.
  • In addition to DEBUG_DIR, the ASCEND_WORK_PATH environment variable can be used to set the path for storing the debugging file generated during operator compilation. The configuration priorities are as follows: DEBUG_DIR > ASCEND_WORK_PATH > default storage path.

Configuration example:

{ge::ir_option::OP_DEBUG_LEVEL, "1"}
{ge::ir_option::DEBUG_DIR, "/home/test/module/out_debug_info"}

Applicability:

Atlas inference products : supported

Atlas training products : supported

Atlas 200I/500 A2 inference products : supported

Atlas A2 training products / Atlas A2 inference products : supported

Atlas A3 training products / Atlas A3 inference products : supported

OP_DEBUG_LEVEL

Debugging switch for operator compilation.

If you want to specify the path for storing the process file of operator compilation, use DEBUG_DIR. If OP_DEBUG_LEVEL is set to 0, DEBUG_DIR does not take effect.

Arguments:

  • 0 (default): Disables operator debug. The operator build folder kernel_meta is not generated in the current execution path.
  • 1: Enables operator debug. The kernel_meta folder is generated in the current execution path, and the .o file (operator binary file), .json file (operator description file), and TBE instruction mapping files (operator file *.cce and python-CCE mapping file *_loc.json) are generated in the folder for later analysis of AI Core errors.
  • 2: Enables operator debug. The kernel_meta folder is generated in the current execution path, and the .o file (operator binary file), .json file (operator description file), and TBE instruction mapping files (operator file *.cce and python-CCE mapping file *_loc.json) are generated in the folder for later analysis of AI Core errors. Setting this option to 2 also disables build optimization and enables the CCE compiler debug function (the CCE compiler option is set to -O0-g).
  • 3: Disables operator debug. The kernel_meta folder is generated in the current execution path, and the .o file (operator binary file) and .json file (operator description file) are generated in the folder. You can refer to these files when analyzing operator errors.
  • 4: Disables operator debug. The kernel_meta folder is generated in the current execution path, and the .o file (operator binary file), .json file (operator description file), TBE instruction mapping file (operator file *.cce), and UB fusion description file ({$kernel_name}_compute.json) are generated in the folder. These files can be used for problem reproduction and accuracy comparison during operator error analysis.
NOTICE:
  • If OP_DEBUG_LEVEL is set to 0 and OP_DEBUG_CONFIG is also set, the operator compilation directory kernel_meta is retained in the current execution path.
  • If OP_DEBUG_LEVEL is set to 0 and the NPU_COLLECT_PATH environment variable is set, the compilation directory kernel_meta is always retained. If the ASCEND_WORK_PATH environment variable is set, the compilation directory is retained in the path specified by the environment variable. If the ASCEND_WORK_PATH environment variable does not exist, the compilation directory is retained in the current execution path.
  • You are advised to set this parameter to 0 or 3 for training. To locate errors, set this parameter to 1 or 2, which might compromise the network performance.
  • If this option is set to 2, the CCE compiler is enabled, and the size of the operator kernel file (*.o file) increases. In the dynamic shape scenario, all possible shape scenarios are traversed during operator build, which may cause operator build failures due to large operator kernel files. In this case, you are advised not to enable the CCE compiler options.

    If a build failure is caused by the large operator kernel file, the following log is displayed:

    message:link error ld.lld: error: InputSection too large for range extension thunk ./kernel_meta_xxxxx.o
  • When the debug function is enabled, if the model contains the following merged compute and communication (MC2) operators, the *.o, *.json, and *.cce files of the operators are not generated in the operator build folder kernel_meta.

    MatMulAllReduce

    MatMulAllReduceAddRmsNorm

    AllGatherMatMul

    MatMulReduceScatter

    AlltoAllAllGatherBatchMatMul

    BatchMatMulReduceScatterAlltoAll

Configuration example:

{ge::ir_option::OP_DEBUG_LEVEL, "1"}

Applicability:

Atlas inference products : supported

Atlas training products : supported

Atlas 200I/500 A2 inference products : supported

Atlas A2 training products / Atlas A2 inference products : supported

Atlas A3 training products / Atlas A3 inference products : supported

OP_DEBUG_CONFIG

Global memory check switch.

Arguments:

The value is the path of the .cfg configuration file. Multiple options in the configuration file are separated by commas (,).

  • oom: Checks whether memory overwriting occurs in the global memory during operator execution.
    • Configuring this option retains the binary operator file (.o) and operator description file (.json) in the kernel_meta folder under the current execution directory during operator build.
    • If this option is used, the following detection logic is added during operator build. You can use the dump_cce option to view the following code in the generated .cce file:
      inline __aicore__ void  CheckInvalidAccessOfDDR(xxx) {
          if (access_offset < 0 || access_offset + access_extent > ddr_size) {
              if (read_or_write == 1) {
                  trap(0X5A5A0001);
              } else {
                  trap(0X5A5A0002);
              }
          }
      }
  • dump_cce: Retains the operator CCE file (.cce), binary operator file (.o), and operator description file (.json) in the kernel_meta folder under the current execution directory during operator build.
  • dump_loc: Retains the python-CCE mapping file *_loc.json, binary operator file (.o), and operator description file (.json) in the kernel_meta folder under the current execution directory during operator build.
  • ccec_O0: Enables the CCEC option -O0 during operator build. This option does not optimize the debugging information for later analysis of AI Core errors.
  • ccec_g: Enables the CCEC option -g during operator build. This option optimizes the debugging information for later analysis of AI Core errors.
  • check_flag: Checks whether pipeline synchronization signals in operators match each other during operator execution.
    • Configuring this option retains the binary operator file (.o) and operator description file (.json) in the kernel_meta folder under the current execution directory during operator build.
    • If this option is used, the following detection logic is added during operator build. You can use the dump_cce option to view the following code in the generated .cce file:
        set_flag(PIPE_MTE3, PIPE_MTE2, EVENT_ID0);
        set_flag(PIPE_MTE3, PIPE_MTE2, EVENT_ID1);
        set_flag(PIPE_MTE3, PIPE_MTE2, EVENT_ID2);
        set_flag(PIPE_MTE3, PIPE_MTE2, EVENT_ID3);
        ....
        pipe_barrier(PIPE_MTE3);
        pipe_barrier(PIPE_MTE2);
        pipe_barrier(PIPE_M);
        pipe_barrier(PIPE_V);
        pipe_barrier(PIPE_MTE1);
        pipe_barrier(PIPE_ALL);
        wait_flag(PIPE_MTE3, PIPE_MTE2, EVENT_ID0);
        wait_flag(PIPE_MTE3, PIPE_MTE2, EVENT_ID1);
        wait_flag(PIPE_MTE3, PIPE_MTE2, EVENT_ID2);
        wait_flag(PIPE_MTE3, PIPE_MTE2, EVENT_ID3);
        ...

      During actual inference, if the pipeline synchronization signals in operators do not match each other, a timeout error is reported at the faulty operator, and the program is terminated. The following is an example of the error message:

      Aicore kernel execute failed, ..., fault kernel_name=operator name,...
      rtStreamSynchronizeWithTimeout execute failed....

Configuration example:

{ge::ir_option::OP_DEBUG_CONFIG, "/root/test0.cfg"}

The information about the test0.cfg file is as follows:

op_debug_config=ccec_g,oom

Restrictions:

During operator build, if you want to build only some instead of all AI Core operators, you need to add the OP_DEBUG_LIST field to the test0.cfg configuration file. By doing so, only the operators specified in the list are built, based on the options configured in OP_DEBUG_CONFIG. The OP_DEBUG_LIST field has the following requirements:

  • The operator name or operator type can be specified.
  • Operators are separated by commas (,). The operator type is configured in the OpType::typeName format. The operator type and operator name can be configured in a mixed manner.
  • The operator to be compiled must be stored in the configuration file specified by OP_DEBUG_CONFIG.

The following is a configuration example: Add the following information to the test0.cfg file:

op_debug_config=ccec_g,oom
op_debug_list=GatherV2,opType::ReduceSum

During model compilation, the GatherV2,ReduceSum operator is compiled based on the ccec_g and oom options.

NOTE:
  • When ccec compilation options (ccec_O0 and ccec_g) are enabled, the size of the operator kernel file (*.o file) increases. In dynamic shape scenarios, all possible scenarios are traversed during operator compilation, which may cause operator compilation failures due to large operator kernel files. In this case, do not enable the CCEC options.

    If the compilation failure is caused by large operator kernel files, the following log is displayed:

    message:link error ld.lld: error: InputSection too large for range extension thunk ./kernel_meta_xxxxx.o:(xxxx)

  • The ccec_O0 and oom options of the CCEC cannot be both enabled. Otherwise, an AI Core error may be reported. The following is an example of the error message:
    ...there is an aivec error exception, core id is 49, error code = 0x4 ...
  • If the NPU_COLLECT_PATH environment variable is configured, the function of checking whether global memory overwriting occurs cannot be enabled (the configuration file specified by OP_DEBUG_CONFIG is set to oom). Otherwise, an error is reported when the compiled model file or operator kernel package is used.
  • When the build options oom, dump_cce, and dump_loc are configured, if the model contains the following MC2 operators, the *.o, *.json, and *.cce files of the operators are not generated in the operator build folder kernel_meta.

    MatMulAllReduce

    MatMulAllReduceAddRmsNorm

    AllGatherMatMul

    MatMulReduceScatter

    AlltoAllAllGatherBatchMatMul

    BatchMatMulReduceScatterAlltoAll

Applicability:

Atlas inference products : supported

Atlas training products : supported

Atlas 200I/500 A2 inference products : supported

Atlas A2 training products / Atlas A2 inference products : supported

Atlas A3 training products / Atlas A3 inference products : supported

OPTION_EXPORT_COMPILE_STAT

Whether to generate the fusion_result.json result file of operator fusion information (including graph fusion and UB fusion) during graph build.

This file is used to record the fusion patterns used during graph build. In the file:

  • session_and_graph_id_xx_xx: thread and graph ID of the fusion result.
  • graph_fusion: graph fusion.
  • ub_fusion: UB fusion.
  • match_times: number of times that the fusion pattern is matched during graph build.
  • effect_times: actual number of times that the fusion takes effect.
  • repository_hit_times: number of times that the UB fusion repository is hit.

Arguments:

  • 0: The result file of operator fusion information is not generated.
  • 1 (default): The result file of operator fusion information is generated when the program exits normally.
  • 2: The result file of operator fusion information is generated when graph build is complete. If graph build is complete, the result file of operator fusion information is generated even if the program is interrupted in advance.
NOTE:
  • If the ASCEND_WORK_PATH environment variable is not set, the result file is generated in the current path where the script is executed by default. If the ASCEND_WORK_PATH environment variable is set, the result file is saved in $ASCEND_WORK_PATH/FE/${Process ID}/fusion_result.json.
  • The fusion patterns disabled using FUSION_SWITCH_FILE are not displayed in the fusion_result.json file.

Configuration example:

{ge::ir_option::OPTION_EXPORT_COMPILE_STAT, "1"}

Applicability:

Atlas inference products : supported

Atlas training products : supported

Atlas 200I/500 A2 inference products : supported

Atlas A2 training products / Atlas A2 inference products : supported

Atlas A3 training products / Atlas A3 inference products : supported

Precision Tuning

Parameter

Description

PRECISION_MODE

Precision mode of an operator. This parameter cannot be used together with PRECISION_MODE_V2 in the same graph. You are advised to use PRECISION_MODE_V2.

Arguments:

  • force_fp32/cube_fp16in_fp32out:
    force_fp32 and cube_fp16in_fp32out have the same effect. This option indicates that the system selects different processing modes based on the operator type when the operator in the AI Core supports both the float32 and float16 data types. cube_fp16in_fp32out is newly added to the new version. For cube operators, this option has clearer semantics.
    • For cube operators, the system processes the computation based on the operator implementation.
      1. The preferred input data type is float16 and the output data type is float32.
      2. If the float16 input data and float32 output data types are not supported, set both the input and output data types to float32.
      3. If the float32 input and output data types are not supported, set both the input and output data types to float16.
      4. If the float16 input and output data types are not supported, an error is reported.
    • For vector compute operators, the operator precision in the original graph is float16 or bfloat16, and float32 is forcibly selected.

      This option is invalid if the original graph contains operators not supporting float32 in the AI Core, for example, an operator that supports only float16. In this case, float16 is retained. If the operator in the AI Core does not support float32 and it is configured to the blocklist of precision reduction (by setting precision_reduce to false), the counterpart AI CPU operator supporting float32 is used. If the AI CPU operator does not support float32, an error is reported.

  • force_fp16 (default):

    Indicates that float16 is forcibly selected if the operator precision in the original graph is float16, bfloat16, and float32.

  • allow_fp32_to_fp16:
    • For matrix operators:
      • If the operator precision in the original graph is float32, the precision is preferably reduced to float16. If the operator in the AI Core does not support float16, float32 is used. If the operator in the AI Core does not support float32, the AI CPU operator is used for computation. If the AI CPU operator also does not support float32, an error is reported during execution.
      • If the operator precision in the original graph is bfloat16, the precision of the original graph is preferably used. If the operator in the AI Core does not support bfloat16, float32 is used. If the operator in the AI Core does not support float32, the precision is directly reduced to float16. If the operator in the AI Core does not support float16, the AI CPU operator is used for computation. If the AI CPU operator also does not support float16, an error is reported during execution.
    • For vector operators, the precision of the original graph is retained preferably.
      • If the operator precision in the original graph is float32, the precision of the original graph is preferably used. If the operator in the AI Core does not support float32, the precision is directly reduced to float16. If the operator in the AI Core does not support float16, the AI CPU operator is used for computation. If the AI CPU operator also does not support float16, an error is reported during execution.
      • If the operator precision in the original graph is bfloat16, the precision of the original graph is preferably used. If the operator in the AI Core does not support bfloat16, float32 is used. If the operator in the AI Core does not support float32, the precision is directly reduced to float16. If the operator in the AI Core does not support float16, the AI CPU operator is used for computation. If the AI CPU operator also does not support float16, an error is reported during execution.
  • must_keep_origin_dtype:

    Retain the original precision.

    • If the precision of an operator in the original graph is float16, and the implementation of the operator in the AI Core does not support float16 but supports only float32 and bfloat16, the system automatically uses high-precision float32.
    • If the precision of an operator in the original graph is float16, and the implementation of the operator in the AI Core does not support float16 but supports only bfloat16, the AI CPU operator of float16 is used. If the AI CPU operator is not supported, an error is reported.
    • If the precision of an operator in the original graph is float32, and the implementation of the operator in the AI Core does not support float32 but supports only float16, the AI CPU operator of float32 is used. If the AI CPU operator is not supported, an error is reported.
  • allow_mix_precision/allow_mix_precision_fp16:

    allow_mix_precision has the same effect as that of allow_mix_precision_fp16, indicating that mixed precision of float16, bfloat16, and float32 is used for neural network processing. allow_mix_precision_fp16 is newly added to the new version, which has clearer semantics for easy understanding.

    For float32 and befloat16 operators in the original model, float16 is automatically used for certain float32 and bfloat16 operators based on the built-in tuning policy. This will improve system performance and reduce memory usage with minimal precision degradation.

    If this mode is configured, you can view the value of precision_reduce in the built-in tuning policy file of ${INSTALL_DIR}/opp/built-in/op_impl/ai_core/tbe/config/xxx/aic-xxx-ops-info-*.json.

    • If it is set to true, the operator is on the mixed precision trustlist and its precision will be reduced from float32 and bfloat16 to float16.
    • If it is set to false, the operator is on the mixed precision blocklist and its precision will not be reduced from float32 and bfloat16 to float16. In this case, the operator still uses the precision of float32 or bfloat16.
    • If an operator in the network model does not have the precision_reduce option configured, the operator is on the graylist and will follow the same precision processing as the upstream operator.
  • allow_mix_precision_bf16:

    Mixed precision of bfloat16 and float32 is used for neural network processing. In this mode, bfloat16 is automatically used for certain float32 operators on the original model based on the built-in tuning policy. This will improve system performance and reduce memory usage with minimal precision degradation. If the operator in the AI Core does not support bfloat16 and float32, the AI CPU operator is used for computation. If AI CPU operator also does not support bfloat16 and float32, an error is reported during execution.

    If this mode is configured, you can view the value of precision_reduce in the built-in tuning policy file of ${INSTALL_DIR}/opp/built-in/op_impl/ai_core/tbe/config/xxx/aic-xxx-ops-info-*.json.

    • If it is set to true, the operator is on the mixed precision trustlist and its precision will be reduced from float32 to bfloat16.
    • If the field value is false, the operator is on the mixed precision blocklist and its precision will not be reduced from float32 to bfloat16.
    • If an operator in the network model does not have the precision_reduce option configured, the operator is on the graylist and will follow the same precision processing as the upstream operator.
  • allow_fp32_to_bf16:
    • If the operator precision in the original graph is float32, the precision of the original graph is preferably used. If the operator in the AI Core does not support float32, the precision is reduced to bfloat16. If the operator in the AI Core does not support bfloat16, the AI CPU operator is used for computation. If the AI CPU operator also does not support bfloat16, an error is reported during execution.
    • If the operator precision in the original graph is bfloat16, the precision of the original graph is preferably used. If the operator in the AI Core does not support bfloat16, float32 is used. If the operator in the AI Core does not support float32, the AI CPU operator is used for computation. If the AI CPU operator also does not support float32, an error is reported during execution.

Replace ${INSTALL_DIR} with the CANN component directory. For example, if the installation is performed by the root user, the default file storage path is /usr/local/Ascend/cann.

Restrictions:

  • The bfloat16 data type supports only the following products:

    Atlas A2 training products / Atlas A2 inference products

    Atlas A3 training products / Atlas A3 inference products

    Atlas 200I/500 A2 inference products

  • For this option, performance takes priority for the default value and precision overflow issues may occur during subsequent inference. If a precision issue occurs during inference, locate the fault by referring to ""Accuracy Improvement Suggestions for Model Inference"".
  • If you want to avoid precision issues, you can set the option to a value other than the default one. For example, you can set the option to must_keep_origin_dtype.

Configuration example:

{ge::ir_option::PRECISION_MODE, "force_fp16"}

Applicability:

Atlas inference products : supported

Atlas training products : supported

Atlas 200I/500 A2 inference products : supported

Atlas A2 training products / Atlas A2 inference products : supported

Atlas A3 training products / Atlas A3 inference products : supported

PRECISION_MODE_V2

Precision mode of an operator. This parameter cannot be used together with PRECISION_MODE in the same graph. You are advised to use PRECISION_MODE_V2.

Arguments:

  • fp16 (default):

    Indicates that float16 is forcibly selected if the operator precision in the original graph is float16, bfloat16, or float32.

  • origin:

    Retain the original precision.

    • If the precision of an operator in the original graph is float16, and the implementation of the operator in the AI Core does not support float16 but supports only float32 and bfloat16, the system automatically uses high-precision float32.
    • If the precision of an operator in the original graph is float16, and the implementation of the operator in the AI Core does not support float16 but supports only bfloat16, the AI CPU operator of float16 is used. If the AI CPU operator is not supported, an error is reported.
    • If the precision of an operator in the original graph is float32, and the implementation of the operator in the AI Core does not support float32 but supports only float16, the AI CPU operator of float32 is used. If the AI CPU operator is not supported, an error is reported.
  • cube_fp16in_fp32out:
    The system selects a processing mode based on the operator type for AI Core operators supporting both float32 and float16.
    • For cube operators, the system processes the computation based on the operator implementation.
      1. The preferred input data type is float16 and the output data type is float32.
      2. If the float16 input data and float32 output data types are not supported, set both the input and output data types to float32.
      3. If the float32 input and output data types are not supported, set both the input and output data types to float16.
      4. If the float16 input and output data types are not supported, an error is reported.
    • For vector compute operators, the operator precision in the original graph is float16 or bfloat16, and float32 is forcibly selected.

      This option is invalid if the original graph contains operators not supporting float32 in the AI Core, for example, an operator that supports only float16. In this case, float16 is retained. If the operator in the AI Core does not support float32 and it is configured to the blocklist of precision reduction (by setting precision_reduce to false), the counterpart AI CPU operator supporting float32 is used. If the AI CPU operator does not support float32, an error is reported.

  • mixed_float16:

    Mixed precision of float16, bfloat16, and float32 is used for neural network processing. For float32 and befloat16 operators in the original graph, float16 is automatically used for certain float32 and bfloat16 operators based on the built-in tuning policy. This will improve system performance and reduce memory usage with minimal precision degradation.

    If this mode is configured, you can view the value of precision_reduce in the built-in tuning policy file of ${INSTALL_DIR}/opp/built-in/op_impl/ai_core/tbe/config/xxx/aic-xxx-ops-info-*.json.

    • If it is set to true, the operator is on the mixed precision trustlist and its precision will be reduced from float32 and bfloat16 to float16.
    • If it is set to false, the operator is on the mixed precision blocklist and its precision will not be reduced from float32 and bfloat16 to float16. In this case, the operator still uses the precision of float32 or bfloat16.
    • If an operator in the network model does not have the precision_reduce option configured, the operator is on the graylist and will follow the same precision processing as the upstream operator.
  • mixed_bfloat16:

    Mixed precision of bfloat16 and float32 is used for neural network processing. In this mode, bfloat16 is automatically used for certain float32 operators in the original graph based on the built-in tuning policy. This will improve system performance and reduce memory usage with minimal precision degradation. If the operators do not support bfloat16 and float32, the AI CPU operators are used for computation. If AI CPU operators also do not support float16 and float32, an error is reported during execution.

    If this mode is configured, you can view the value of precision_reduce in the built-in tuning policy file of ${INSTALL_DIR}/opp/built-in/op_impl/ai_core/tbe/config/xxx/aic-xxx-ops-info-*.json.

    • If it is set to true, the operator is on the mixed precision trustlist and its precision will be reduced from float32 to bfloat16.
    • If the field value is false, the operator is on the mixed precision blocklist and its precision will not be reduced from float32 to bfloat16.
    • If an operator in the network model does not have the precision_reduce option configured, the operator is on the graylist and will follow the same precision processing as the upstream operator.
  • mixed_hif8:

    Enables automatic mixed precision, indicating that hifloat8 (for details about this data type, see Link), float16, bfloat16, and float32 are used together for neural network processing. In this mode, hifloat8 is automatically used for certain float16, bfloat16, and float32 operators in the original graph based on the built-in tuning policy. This will improve system performance and reduce memory usage with minimal precision degradation. The current version does not support this argument.

    If this mode is configured, you can view the value of precision_reduce in the built-in tuning policy file of ${INSTALL_DIR}/opp/built-in/op_impl/ai_core/tbe/config/xxx/aic-xxx-ops-info-*.json.

    • If it is set to true, the operator is on the mixed precision trustlist and its precision will be reduced from float16, bfloat16, and float32 to hifloat8.
    • If it is set to false, the operator is on the mixed precision blocklist and its precision will not be reduced from float16, bfloat16, and float32 to hifloat8. In this case, the operator still uses the precision of float16, bfloat16, or float32.
    • If an operator in the original graph does not have the precision_reduce option configured, the operator is on the graylist and will follow the same precision processing as the upstream operator.
  • cube_hif8:

    The hifloat8 data type is forcibly used if the Cube operator in the original graph supports both hifloat8 and float16, bfloat16, or float32. The current version does not support this argument.

Replace ${INSTALL_DIR} with the CANN component directory. For example, if the installation is performed by the root user, the default file storage path is /usr/local/Ascend/cann.

Restrictions:

  • The bfloat16 data type supports only the following products:

    Atlas A2 training products / Atlas A2 inference products

    Atlas A3 training products / Atlas A3 inference products

    Atlas 200I/500 A2 inference products

  • For this option, performance takes priority for the default value and precision overflow issues may occur during subsequent inference. If a precision issue occurs during inference, locate the fault by referring to ""Accuracy Improvement Suggestions for Model Inference"".
  • If you want to avoid precision issues, you can set the option to a value other than the default one. For example, you can set the option to origin.

Configuration example:

{ge::ir_option::PRECISION_MODE_V2, "fp16"}

Applicability:

Atlas inference products : supported

Atlas training products : supported

Atlas 200I/500 A2 inference products : supported

Atlas A2 training products / Atlas A2 inference products : supported

Atlas A3 training products / Atlas A3 inference products : supported

MODIFY_MIXLIST

When mixed precision is enabled, you can use this parameter to specify the path and file name of the blocklist, trustlist, and graylist, and specify the operators that allow precision degradation and those that do not allow precision degradation. Set this parameter to the path including the file name. The file is in JSON format. You can view the flag value under precision_reduce in the built-in tuning policy file of ${INSTALL_DIR}/opp/built-in/op_impl/ai_core/tbe/config/xxx/aic-xxx-ops-info-*.json. Replace ${INSTALL_DIR} with the CANN component directory. For example, if the installation is performed by the root user, the default file storage path is /usr/local/Ascend/cann. xxx varies depending on the product.

  • true (trustlist): Precision reduction is allowed in mixed precision mode.
  • false (blocklist): Precision reduction is not allowed in mixed precision mode.
  • Not specified (graylist): Operators on the graylist follow the same precision processing as its upstream operator.

Method for enabling mixed precision:

  • Set PRECISION_MODE to allow_mix_precision, allow_mix_precision_bf16, or allow_mix_precision_fp16.
  • Set PRECISION_MODE_V2 to mixed_float16 or mixed_bfloat16 cannot be configured at the same time. You are advised to use PRECISION_MODE_V2.
Configuration example:
{ge::ir_option::MODIFY_MIXLIST, "/home/test/ops_info.json"}

You can specify the operator types in ops_info.json as follows. Separate operators with commas (,).

{
  "black-list": {                  // Blocklist
     "to-remove": [                // Move an operator from the blocklist to the graylist. Ensure that the specified operator is already on the blocklist.
     "Xlog1py"
     ],
     "to-add": [                   // Move an operator from the trustlist or graylist to the blocklist.
     "Matmul",
     "Cast"
     ]
  },
  "white-list": {                  // Trustlist
     "to-remove": [                // Move an operator from the trustlist to the graylist. Ensure that the specified operator is already on the trustlist.
     "Conv2D"
     ],
     "to-add": [                   // Move an operator from the blocklist or graylist to the trustlist.
     "Bias"
     ]
  }
}

The operators in the preceding example configuration file are for reference only. The configuration should be based on the actual hardware environment and the built-in tuning strategies of the operators. The following is an example of blocklist, trustlist, and graylist query:

"Conv2D":{
    "precision_reduce":{
        "flag":"true"
     }
},

true: trustlist; false: blocklist; Not configured: graylist.

Applicability:

Atlas inference products : supported

Atlas training products : supported

Atlas 200I/500 A2 inference products : supported

Atlas A2 training products / Atlas A2 inference products : supported

Atlas A3 training products / Atlas A3 inference products : supported

CUSTOMIZE_DTYPES

Customized operator precision during model build. Other operators in the model are built according to PRECISION_MODE or PRECISION_MODE_V2. This parameter is set to the path (including name of the configuration file), for example, /home/test/customize_dtypes.cfg.

Restrictions:

  • List the names or types of operators whose computing precision needs customization in the configuration file. Each operator occupies a line, and the operator type must be defined based on IR.
  • If both of the operator name and type are configured for an operator, the operator name applies during build.
  • The computing precision of an operator specified by this parameter does not take effect if the operator is fused during build.

The structure of the configuration file is as follows:

# Configuration by operator name
Opname1::InputDtype:dtype1,dtype2,…OutputDtype:dtype1,…
Opname2::InputDtype:dtype1,dtype2,…OutputDtype:dtype1,…
# Configuration by operator type
OpType::TypeName1:InputDtype:dtype1,dtype2,…OutputDtype:dtype1,…
OpType::TypeName2:InputDtype:dtype1,dtype2,…OutputDtype:dtype1,…

The following is an example of the configuration file:

# Configuration by operator name
resnet_v1_50/block1/unit_3/bottleneck_v1/Relu::InputDtype:float16,int8,OutputDtype:float16,int8
# Configuration by operator type
OpType::Relu:InputDtype:float16,int8,OutputDtype:float16,int8
NOTE:
  • You can view the computing precision supported by an operator in the operator information library, which is stored in ${INSTALL_DIR}/opp/built-in/op_impl/ai_core/tbe/config/xxx/aic-xxx-ops-info-*.json by default.

    Replace ${INSTALL_DIR} with the CANN component directory. For example, if the installation is performed by the root user, the default file storage path is /usr/local/Ascend/cann. xxx varies depending on the product.

  • The data type specified by this option takes high priority, which may cause precision or performance degradation. If the specified data type is not supported, the build will fail.

Configuration example:

{ge::ir_option::CUSTOMIZE_DTYPES, "/home/test/customize_dtypes.cfg"}

Applicability:

Atlas inference products : supported

Atlas training products : supported

Atlas 200I/500 A2 inference products : supported

Atlas A2 training products / Atlas A2 inference products : supported

Atlas A3 training products / Atlas A3 inference products : supported

Precision Comparison

Parameter

Description

QUANT_DUMPABLE

Whether to collect the dump data of the quantization operator.

For details, see ""Accuracy Improvement Suggestions for Model Inference"" in Application Development Guide (C&C++). During precision locating, if there is a model after AMCT quantization, the input and output of the quantization operators may be optimized during graph build when the model is converted to an OM offline model, affecting the dump data export of the quantization operators. For example, for two quantized convolution calculations, the intermediate output is optimized to the quantized output of int8.

To solve this problem, the QUANT_DUMPABLE parameter is introduced. After this parameter is enabled, the input and output of the quantization operator are not fused. The transdata operator is inserted to restore the original model format. In this way, the dump data of the quantization operator can be collected.

Arguments:

  • 0 (default): The inputs and outputs of the quantization operators may be optimized during graph build. In this case, the dump data of the quantization operators cannot be obtained.
  • 1: After this function is enabled, the dump data of the quantization operator can be collected.

Configuration example:

{ge::ir_option::QUANT_DUMPABLE, "1"}

Applicability:

Atlas inference products : supported

Atlas training products : supported

Atlas 200I/500 A2 inference products : supported

Atlas A2 training products / Atlas A2 inference products : supported

Atlas A3 training products / Atlas A3 inference products : supported

Performance Tuning

Parameter

Description

OP_PRECISION_MODE

Precision mode of one or more specified operators during internal processing. This parameter is used to transfer the customized precision mode configuration file op_precision.ini to set different precision modes for different operators.

The following precision modes can be set in the configuration file:

  • high_precision
  • high_performance
  • enable_float_32_execution: The FP32 data type is used for internal processing of operators. In this scenario, the FP32 data type is not automatically converted to the HF32 data type. If you are using the HF32 data type for computation and find that the accuracy drop exceeds your expectation, you can enable this configuration to specify the use of FP32 for internal computation of certain operators in order to maintain accuracy.

    This option supports only the following products:

    Atlas A2 training products / Atlas A2 inference products

    Atlas A3 training products / Atlas A3 inference products

  • enable_hi_float_32_execution: The HF32 data type is used for internal processing of operators. After it is enabled, the FP32 data type is automatically converted to the HF32 data type. This configuration reduces the space occupied by data and improves performance. It is not supported in the current version.
  • support_out_of_bound_index: indicates that the out-of-bounds verification is performed on the indices of the gather, scatter, and segment operators. The verification deteriorates the operator execution performance.
  • keep_fp16: The FP16 data type is used for internal processing of operators. In this scenario, the FP16 data type is not automatically converted to the FP32 data type. If the performance of FP32 computation does not meet the expectation and high precision is not required, you can select the keep_fp16 mode. This low-precision mode sacrifices the precision for improving the performance, which is not recommended.
  • super_performance: Indicates ultra-high performance. Compared with high performance, the algorithm calculation formula is optimized.

You can view the precision or performance mode supported by an operator in the opp/built-in/op_impl/ai_core/tbe/impl_mode/all_ops_impl_mode.ini file in the file storage path with the CANN software installed.

Sample: Set the precision mode based on the operator type (low priority) or node name (high priority) in each row in the INI file.

[ByOpType]
optype1=high_precision
optype2=high_performance
optype3=enable_hi_float_32_execution
optype4=support_out_of_bound_index

[ByNodeName]
nodename1=high_precision
nodename2=high_performance
nodename3=enable_hi_float_32_execution
nodename4=support_out_of_bound_index

Restrictions:

  • This parameter is mutually exclusive with OP_SELECT_IMPL_MODE and OPTYPELIST_FOR_IMPLMODE. If they are all specified, OP_PRECISION_MODE takes precedence.
  • For the same operator, if enable_hi_float_32_execution or enable_float_32_execution is configured using OP_PRECISION_MODE, you are advised not to use this parameter together with ALLOW_HF32. If they are used together, the priority is as follows:

    op_precision_mode(ByNodeName) > allow_hf32 > op_precision_mode(ByOpType)

  • You are advised not to set this parameter. It is used if you need to adjust the precision of a specific operator using the INI configuration file in the case that you fail to obtain optimal network performance or accuracy in the high-performance or high-precision mode.

Applicability:

Atlas inference products : supported

Atlas training products : supported

Atlas 200I/500 A2 inference products : supported

Atlas A2 training products / Atlas A2 inference products : supported

Atlas A3 training products / Atlas A3 inference products : supported

TILING_SCHEDULE_OPTIMIZE

Whether to enable the optimization for tiling offload scheduling.

As internal storage of the AI Cores in the NPU cannot store all the input and output data of operators, the input data is tiled into different parts. The first part is transferred in, computed, and then transferred out, so does the next part. This process is called tiling. Then, a computation program, called tiling implementation, determines tiling parameters (such as the block size transferred each time and the total number of cycles) based on operator information such as shape. The AI Core is not good at scalar computation in the tiling implementation. Therefore, tiling implementation is generally executed on the CPU on the host. However, tiling implementation is executed on the device when the following conditions are met:

  1. The model is static-shape.
  2. Operators in the model, such as the FusedInferAttentionScore and IncreFlashAttention fused operators, support tiling offload.
  3. The output values of the operators that support tiling offload have dependencies, that is, the output value of the previous operator contains the execution result of the device. If the value to be depended on is a Const value, tiling offload is not required, and tiling is completed during build.

Arguments:

  • 0 (default): Tiling offload is disabled.
  • 1: Tiling offload is enabled.

Configuration example:

{ge::ir_option::TILING_SCHEDULE_OPTIMIZE, "1"}

Applicability:

Atlas inference products : supported

Atlas A2 training products / Atlas A2 inference products : supported

Atlas A3 training products / Atlas A3 inference products : supported

Atlas training products : not supported

Atlas 200I/500 A2 inference products : not supported

AOE

Parameter

Description

MDL_BANK_PATH

Path of the custom repository generated after subgraph tuning

This parameter must be used together with BUFFER_OPTIMIZE in aclgrphBuildInitialize Configuration Parameters and takes effect only when buffer optimization is enabled to improve performance by temporarily storing data in the buffer at a high speed.

Argument: path of the custom repository generated after model tuning.

Format: The path can contain letters (a–z, A–Z), digits (0-9), underscores (_), hyphens (-), and periods (.).

Default: $HOME/Ascend/latest/data/aoe/custom/graph/<soc_version>

Configuration example:

{ge::ir_option::MDL_BANK_PATH, "$HOME/custom_module_path"}

Restrictions:

Path (path of the custom repository generated after subgraph tuning) priority ranked from high to low: path specified by MDL_BANK_PATH > path specified by the TUNE_BANK_PATH environment variable > default path.

  1. If the TUNE_BANK_PATH environment variable is used to specify the custom repository path before model compilation and MDL_BANK_PATH is used to specify the custom repository path during model compilation, then the path specified by MDL_BANK_PATH takes effect and the path specified by the TUNE_BANK_PATH environment variable does not take effect.
  2. The default path takes effect if both the paths specified by MDL_BANK_PATH and the environment variable are invalid or the directories contain no custom repository.
  3. If no custom repository is available in the preceding directories, the built-in repository for subgraph tuning is searched in the ${INSTALL_DIR}/<arch>-linux/data/fusion_strategy/built-in path. <arch>/ indicates the OS architecture.

Applicability:

Atlas inference products : supported

Atlas training products : supported

Atlas 200I/500 A2 inference products : supported

Atlas A2 training products / Atlas A2 inference products : supported

Atlas A3 training products / Atlas A3 inference products : supported

OP_BANK_PATH

Path of the custom repository generated after operator tuning.

Format: The path can contain letters (a–z, A–Z), digits (0–9), underscores (_), hyphens (-), and periods (.).

Default: ${HOME}/Ascend/latest/data/aoe/custom/op

Configuration example:

{ge::ir_option::OP_BANK_PATH, "$HOME/custom_tune_path"}

Restrictions:

Path (path of the custom repository generated after operator tuning) priority ranked from high to low: path specified by the TUNE_BANK_PATH environment variable > path specified by OP_BANK_PATH > default path of the custom repository generated after operator tuning.

  1. If the TUNE_BANK_PATH environment variable is used to specify the custom repository path before model conversion and OP_BANK_PATH is used to specify the custom repository path during model compilation, then the path specified by the TUNE_BANK_PATH environment variable takes effect and the path specified by OP_BANK_PATH does not take effect.
  2. The default path takes effect if both the paths specified by OP_BANK_PATH and the environment variable are invalid.
  3. If none of the preceding directories contain the custom repository, the system searches the built-in directory of the custom repository generated after operator tuning.

Applicability:

Atlas inference products : supported

Atlas training products : supported

Atlas 200I/500 A2 inference products : supported

Atlas A2 training products / Atlas A2 inference products : supported

Atlas A3 training products / Atlas A3 inference products : supported

Experiment Parameters

Parameter

Description

ALLOW_HF32

This parameter is reserved and is not supported in the current version.

Whether to enable the function of automatically replacing the float32 data type with the HF32 data type. In the current version, this option takes effect only for Conv and Matmul operators.

HF32 is a single-precision floating-point type developed by Ascend for internal computation of operators. The following shows the comparison with other common data types. HF32 shares the value range with float32, but its mantissa precision (11 bits) is close to FP16 (10 bits). Replacing the original float32 single-precision data type with the HF32 single-precision data type by precision reduction can greatly reduce the space occupied by data and improve performance.

Arguments:

  • true: Enable the function of automatically converting the FP32 data type to the HF32 data type for Conv and Matmul operators.

    For details about the operators for which this function is enabled, see opp/built-in/op_impl/ai_core/tbe/impl_mode/allow_hf32_matmul_t_conv_t.ini in the file storage path after the CANN software is installed. This file cannot be modified by users.

  • false: Disable the function of automatically converting the FP32 data type to the HF32 data type for Conv and Matmul operators.

    For details about the operators for which this function is disabled, see opp/built-in/op_impl/ai_core/tbe/impl_mode/allow_hf32_matmul_f_conv_f.ini in the file storage path after the CANN software is installed. This file cannot be modified by users.

Default: Enable FP32-to-HF32 conversion for Conv operators; disable FP32-to-HF32 conversion for Matmul operators.

Restrictions:

  • For the same operator, if enable_hi_float_32_execution or enable_float_32_execution is configured using OP_PRECISION_MODE, you are advised not to use this parameter together with ALLOW_HF32. If they are used together, the priority is as follows:

    OP_PRECISION_MODE(ByNodeName) > ALLOW_HF32 > OP_PRECISION_MODE(ByOpType)

  • ALLOW_HF32 automatically replaces float32 with HF32. To make this option take effect, ensure that the input or output type of the enabled operator is float32. The default value of PRECISION_MODE_V2 is fp16. If the operator type in the original network model is float32, the operator type is forcibly converted to float16. In this case, ALLOW_HF32 does not take effect. You are advised to change the value of PRECISION_MODE_V2 to origin. The default value of PRECISION_MODE is force_fp16, and you are advised to change the value to must_keep_origin_dtype or force_fp32.

Configuration example:

{ge::ir_option::ALLOW_HF32, "true"}

Applicability:

Atlas A2 training products / Atlas A2 inference products : supported

Atlas A3 training products / Atlas A3 inference products : supported

Atlas inference products : not supported

Atlas training products : not supported

Atlas 200I/500 A2 inference products : not supported

BUILD_INNER_MODEL

Not supported in the current version.

OO_LEVEL

Extended option for debugging. It cannot be used in commercial products and will be released as a formal function in later versions.

Multi-level optimization options for graph build include subgraph optimization, entire graph optimization, and static shape model offloading.

Static shape model offloading: In this approach, the input and output shapes of all operators in a static shape model can be determined at build time, allowing for model-level memory orchestration and operator tiling computation to be completed on the host. These computations are then batched and sent to the device stream when the model is loaded, but they are not executed immediately. Instead, the execution of all tasks within the model is triggered by the delivery of model execution tasks.

Arguments:

  • O1: Disables all graph fusion and UB fusion passes, and performs only optimizations related to static offloading, such as InferShape (output tensor shape inference), constant folding, dead-edge elimination, and other optimizations.
  • O3 (default): Enable s all optimizations.

Restrictions:

If the value is O1, all graph fusion and UB fusion passes are disabled, and only passes related to static offloading are enabled. However, the graph fusion passes in the following files are enabled by default because function problems may occur if they are disabled:

All graph fusion passes under the ExceptionalPassOfO1Level field in the ${INSTALL_DIR}/<arch>-linux/lib64/plugin/opskernel/fusion_pass/config/fusion_config.json file

Replace ${INSTALL_DIR} with the CANN component directory. For example, if the installation is performed by the root user, the default file storage path is /usr/local/Ascend/cann.<arch> indicates the OS architecture.

Configuration example:

{ge::ir_option::OO_LEVEL, "O3"}

Applicability:

Atlas inference products : supported

Atlas training products : supported

Atlas 200I/500 A2 inference products : supported

Atlas A2 training products / Atlas A2 inference products : supported

Atlas A3 training products / Atlas A3 inference products : supported

OO_CONSTANT_FOLDING

Extended option for debugging. It cannot be used in commercial products and will be released as a formal function in later versions.

Whether to enable constant folding optimization.

Constant folding is the process of replacing nodes that can be evaluated to a constant output value in a computational graph with that constant, and simplifying the structure of the computational graph accordingly.

Arguments:

  • true (default): enabled
  • false: disabled

Configuration example:

{ge::ir_option::OO_CONSTANT_FOLDING, "true"}

Applicability:

Atlas inference products : supported

Atlas training products : supported

Atlas 200I/500 A2 inference products : supported

Atlas A2 training products / Atlas A2 inference products : supported

Atlas A3 training products / Atlas A3 inference products : supported

OO_DEAD_CODE_ELIMINATION

Extended option for debugging. It cannot be used in commercial products and will be released as a formal function in later versions.

Whether to enable dead-edge elimination optimization.

Dead-edge elimination (switch dead-edge elimination): When pred (input 1) of a switch statement is a constant node, one of the branches can be eliminated based on the value of const. If const is true, the false branch is eliminated; if const is false, the true branch is eliminated.

Arguments:

  • true (default): enabled
  • false: disabled

Configuration example:

{ge::ir_option::OO_DEAD_CODE_ELIMINATION, "true"}

Applicability:

Atlas inference products : supported

Atlas training products : supported

Atlas 200I/500 A2 inference products : supported

Atlas A2 training products / Atlas A2 inference products : supported

Atlas A3 training products / Atlas A3 inference products : supported

Parameters That Will Be Deprecated in Later Versions

Parameter

Description

INPUT_SHAPE_RANGE

This parameter is deprecated. Avoid using it. To specify the shape range of the input data of a model, use INPUT_SHAPE.

Shape range of the input data of a model. This parameter is mutually exclusive with DYNAMIC_BATCH_SIZE, DYNAMIC_IMAGE_SIZE, and DYNAMIC_DIMS.

  • To set the shape range based on node names, the format is "input_name1:[n1,c1,h1,w1];input_name2:[n2,c2,h2,w2]", for example, "input_name1:[8~20,3,5,-1];input_name2:[5,3~9,10,-1]". Enclose the specified nodes in double quotation marks (""), and separate them by semicolons (;). input_name must be the name of the original node before conversion, and the shape range values must be placed in []. As a best practice, you should set INPUT_SHAPE_RANGE based on data node names.
  • To set the shape range based on node indexes, the format is "[n1,c1,h1,w1],[n2,c2,h2,w2]", for example, "[8~20,3,5,-1],[5,3~9,10,-1]". If node names are not configured, the first pair of brackets ([]) denotes the first input node by default. Separate the nodes with commas (,). When INPUT_SHAPE_RANGE is specified based on the index, the index attribute must be set sequentially from 0 for data nodes.
  • The size of a static dimension is specified by a determinant value. The size range of a dynamic dimension is specified by using a tilde (~). A dynamic dimension without size range specified is denoted by –1.
  • For a scalar input, enclose its shape range in square brackets ([]).
  • Assume that your graph has three inputs and only the first input has a static shape; the static shape must be specified.

Applicability:

Atlas inference products : supported

Atlas training products : supported

Atlas A2 training products / Atlas A2 inference products : supported

Atlas 200I/500 A2 inference products : not supported

SHAPE_GENERALIZED_BUILD_MODE

Shape build mode during graph build. This parameter will be deprecated in later versions. Do not use this parameter for new functions.

  • shape_generalized: fuzzy compilation. The system generalizes the runtime dimensions of dynamic-shape operators before compilation.

    This parameter is used when you want to run multiple inferences based on one compilation.

  • shape_precise: precise compilation. The system directly performs compilation based on the specified shape without any escape operations.

Applicability:

Atlas inference products : supported

Atlas training products : supported

Atlas 200I/500 A2 inference products : supported

Atlas A2 training products / Atlas A2 inference products : supported

Atlas A3 training products / Atlas A3 inference products : supported