aclgrphBuildModel Configuration Parameters

Table 1 aclgrphBuildModel configuration parameters

Parameter

Description

INPUT_FORMAT

Input format.

Arguments:

Must be either NCHW, NHWC, or ND.

Configuration example:

{ge::ir_option::INPUT_FORMAT, "NHWC"}

To enable AIPP during inference, the input data must be in NHWC format. In this scenario, the data format specified by INPUT_FORMAT does not take effect.

NOTE:

This parameter applies only to the dynamic batch size, dynamic image size, and dynamic dimension scenarios.

In these scenarios, INPUT_FORMAT must be consistent with the format of each Data operator. Failure to do so may result in model build failures.

Applicability:

Atlas 200/300/500 Inference Product : supported

Atlas Training Series Product : supported

INPUT_SHAPE

Input shape.

Arguments:

  • Static shape.
    • If the model has a single input, the shape information is "input_name:n,c,h,w".
    • If the model has multiple inputs, the shape information is "input_name1:n1,c1,h1,w1;input_name2:n2,c2,h2,w2". Different inputs are separated by semicolons (;). input_name must be the name of a node in the network model before conversion.
  • If dimension values of the input data in the original model are not fixed, the model can be converted by setting the shape profile or shape range.
    • Setting the shape profile: The shape profiles include the batch size, image size, and dynamic dimension profiles.

      When setting the INPUT_SHAPE parameter, set the corresponding dimension value to -1 and use DYNAMIC_BATCH_SIZE (setting batch size profiles), DYNAMIC_IMAGE_SIZE (setting image size profiles), or DYNAMIC_DIMS (setting dynamic dimension profiles). For details, see the parameter description of DYNAMIC_BATCH_SIZE, DYNAMIC_IMAGE_SIZE, and DYNAMIC_DIMS.

    • Setting the shape range: The shape range cannot be set for the Atlas 200/300/500 Inference Product

      When setting INPUT_SHAPE, you can define the corresponding dimension with a range of valid values, for example, 1~10.

      • To set the shape range based on node names, the format is "input_name1:n1,c1,h1,w1;input_name2:n2,c2,h2,w2", for example, "input_name1:8~20,3,5,-1;input_name2:5,3~9,10,-1". Enclose the specified nodes in double quotation marks (""), and separate them by semicolons (;). input_name must be the node name in the network model before model conversion. As a best practice, you should set the parameter based on node names.
      • To set the shape range based on node indexes, the format is "n1,c1,h1,w1;n2,c2,h2,w2", for example, "8~20,3,5,-1;5,3~9,10,-1". If the node name is not specified, the nodes are sorted by the index and separated by semicolons (;). When the shape range is specified based on the index, the index attribute must be set sequentially from 0 for data nodes.

      If you do not want to specify the dimension range or value, you can set it to -1, indicating that the dimension can be any value greater than or equal to 0. In this scenario, the upper limit of the value is the int64 type range. However, the value is limited by the size of the physical memory on the host and device, so you can increase the memory size to support it.

  • Scalar shape.
    • Non-dynamic profile scenario:

      Shape is a scalar input, which is optional. For example, if the model has two inputs — input_name1 is a scalar with shape in the "[]" format, and input_name2 has the shape of [n2,c2,h2,w2], then the shape information of the model is "input_name1:;input_name2:n2,c2,h2,w2". Different inputs are separated by semicolons (;). input_name must be the node name in the network model before conversion. If the scalar input needs to be configured, leave it empty.

    • Dynamic profile scenario:

      If the model input has both scalar shape and dynamic-profile shape, the scalar input must be configured. For example, if a model has three inputs: A:[-1,c1,h1,w1], B:[], and C:[n2,c2,h2,w2], the shape information is "A:-1,c1,h1,w1; B:;C:n2,c2,h2,w2". Scalar input B must be configured.

Configuration example:

  • Static shape. For example, if the input shape information of a network consists of two inputs (input_0_0 [16,32,208,208] and input_1_0 [16,64,208,208]), the configuration of INPUT_SHAPE is as follows:
    {ge::ir_option::INPUT_SHAPE, "input_0_0:16,32,208,208;input_1_0:16,64,208,208"}
  • For details about how to set the batch size, see DYNAMIC_BATCH_SIZE.
  • For details about how to set the image size, see DYNAMIC_IMAGE_SIZE.
  • For details about how to set profiles for a specified dimension, see DYNAMIC_DIMS.
  • The following is an example of setting the shape range:
    {ge::ir_option::INPUT_SHAPE, "input_0_0:1~10,32,208,208;input_1_0:16,64,100~208,100~208"}
  • Scalar shape.
    • Non-dynamic profile scenario:

      Shape is a scalar input, which is optional. For example, if the model has two inputs — input_name1 is a scalar and input_name2 has the shape of [16,32,208,208], the configuration example is as follows:

      {ge::ir_option::INPUT_SHAPE, "input_name1:;input_name2:16,32,208,208"}

      In the preceding example, input_name1 is optional.

    • Dynamic profile scenario:

      Shape is a scalar input, which must be configured. For example, if the model has three inputs and the shape information is A:[-1,32,208,208], B:[], and C:[16,64,208,208], the configuration example is as follows (A is the dynamic profile input, and the batch size profile is used as an example):

      {ge::ir_option::INPUT_SHAPE, "A:-1,32,208,208;B:;C:16,64,208,208"}, 
      {ge::ir_option::DYNAMIC_BATCH_SIZE, "1,2,4"} 
NOTE:
  • INPUT_SHAPE is optional. If this parameter is not set, the shape of the corresponding data nodes is used by default. Otherwise, the passed argument is used and updated to those of the corresponding data nodes.
  • If this parameter is used to set the shape range during model conversion:

    When using an application project for model inference, call aclmdlSetDatasetTensorDesc before aclmdlExecute to set the actual input tensor description (input shape range). After the model is executed, call aclmdlGetDatasetTensorDesc to obtain the tensor description of the dynamic output of the model. Then, call the APIs under aclTensorDesc to obtain the memory size occupied by the output tensor data, tensor format, and tensor dimensions.

Applicability:

Atlas 200/300/500 Inference Product : supported

Atlas Training Series Product : supported

INPUT_SHAPE_RANGE

This parameter is deprecated. Avoid using it. To specify the shape range of the input data of a model, use INPUT_SHAPE.

Shape range of the input data of a model. This parameter is mutually exclusive with DYNAMIC_BATCH_SIZE, DYNAMIC_IMAGE_SIZE, and DYNAMIC_DIMS.

  • To set the shape range based on node names, the format is "input_name1:[n1,c1,h1,w1];input_name2:[n2,c2,h2,w2]", for example, "input_name1:[8~20,3,5,-1];input_name2:[5,3~9,10,-1]". Enclose the specified nodes in double quotation marks (""), and separate them by semicolons (;). input_name must be the name of the original node before conversion, and the shape range values must be placed in []. As a best practice, you should set INPUT_SHAPE_RANGE based on node names.
  • To set the shape range based on node indexes, the format is "[n1,c1,h1,w1],[n2,c2,h2,w2]", for example, "[8~20,3,5,-1],[5,3~9,10,-1]". If node names are not configured, the first pair of brackets ([]) denotes the first input node. Separate the nodes with commas (,). In this case, the index attribute must be set sequentially from 0 for data nodes.
  • The size of a static dimension is specified by a determinant value. The size range of a dynamic dimension is specified by using a tilde (~). A dynamic dimension without size range specified is denoted by -1.
  • For a scalar input, enclose its shape range in square brackets ([]).
  • Assume that your graph has three inputs and only the first input has a static shape; the static shape must be specified.

Applicability:

Atlas Training Series Product : supported

Atlas 200/300/500 Inference Product : not supported

OP_NAME_MAP

Directory (including the file name) of the mapping configuration file of a custom operator. The function of a custom operator varies according to the network. You can specify the mapping between the custom operator and the actual custom operator running on the network.

The directory (including the file name) can contain letters, digits, underscores (_), hyphens (-), and periods (.).

Configuration example:

OpA:Network1OpA

Applicability:

Atlas 200/300/500 Inference Product : supported

Atlas Training Series Product : supported

DYNAMIC_BATCH_SIZE

Dynamic batch size profile. Applies to the scenario where the number of images processed per inference batch is unfixed.

This parameter must be used together with INPUT_SHAPE and is mutually exclusive with DYNAMIC_IMAGE_SIZE or DYNAMIC_DIMS. In addition, N must be in the first place of the shape, that is, the first place of the shape must be set to -1. If N is not in the first place, use DYNAMIC_DIMS to set it.

Argument: batch size profiles, for example, "1,2,4,8".

Format: Enclose the specified arguments in double quotation marks (""), and separate profiles with commas (,).

Restrictions: The batch size profile range is (1, 100]. At least two profiles must be set. The recommended value range for each profile is [1, 2048].

Configuration example:

The value -1 of INPUT_SHAPE indicates dynamic batch size enabled.

{ge::ir_option::INPUT_FORMAT, "NHWC"}
{ge::ir_option::INPUT_SHAPE, "data:-1,3,416,416"}, 
{ge::ir_option::DYNAMIC_BATCH_SIZE, "1,2,4,8"}     

For details about the examples and precautions, see Special Topics > Dynamic BatchSize.

Applicability:

Atlas 200/300/500 Inference Product : supported

Atlas Training Series Product : supported

DYNAMIC_IMAGE_SIZE

Dynamic image size configuration. Applies to the scenario where the resolution of images input for inference is not fixed.

This parameter must be used in pair with INPUT_SHAPE and is mutually exclusive with DYNAMIC_BATCH_SIZE and DYNAMIC_DIMS.

Argument: "imagesize1_height,imagesize1_width;imagesize2_height,imagesize2_width"

Format: Enclose the whole argument in double quotation marks (""), and separate profiles by semicolons (;) and separate arguments within a profile by commas (,).

Restrictions: The profile range is (1, 100]. That means at least two profiles must be set.

Configuration example:

The value -1 of INPUT_SHAPE indicates dynamic image size enabled.

{ge::ir_option::INPUT_FORMAT, "NCHW"}, 
{ge::ir_option::INPUT_SHAPE, "data:8,3,-1,-1"}, 
{ge::ir_option::DYNAMIC_IMAGE_SIZE, "416,416;832,832"}

For details about the examples and precautions, see Special Topics > Dynamic Image Size.

Applicability:

Atlas 200/300/500 Inference Product : supported

Atlas Training Series Product : supported

DYNAMIC_DIMS

Dynamic dimension profile in ND format. Applies to the scenario where the dimension size for inference is uncertain.

This parameter must be used in pair with INPUT_SHAPE and is mutually exclusive with DYNAMIC_BATCH_SIZE and DYNAMIC_IMAGE_SIZE.

Argument: formatted as "dim1,dim2,dim3;dim4,dim5,dim6;dim7,dim8,dim9"

Format: Enclose all profiles in double quotation marks (""), and separate profiles by a semicolon (;). The dimension size values match the -1 placeholders in INPUT_SHAPE with ordering preserved, and the number of -1 placeholders equals the number of dimension sizes of each profile.

Restrictions: The profile range is (1, 100]. That is, at least two profiles must be set, and a maximum of 100 profiles are supported. Three to four profiles are recommended.

Configuration example:

{ge::ir_option::INPUT_FORMAT, "ND"},
{ge::ir_option::INPUT_SHAPE, "data:1,-1"}, 
{ge::ir_option::DYNAMIC_DIMS, "4;8;16;64"}  
// At model build time, the supported shape of the Data operator is 1,4; 1,8; 1,16;1,64.
{ge::ir_option::INPUT_FORMAT, "ND"},
{ge::ir_option::INPUT_SHAPE, "data:1,-1,-1"}, 
{ge::ir_option::DYNAMIC_DIMS, "1,2;3,4;5,6;7,8"}  
// At model build time, the supported shape of the Data operator is 1,1,2; 1,3,4; 1,5,6; 1,7,8.

For details about the examples and precautions, see Special Topics > Dynamic Dimension Size.

Applicability:

Atlas 200/300/500 Inference Product : supported

Atlas Training Series Product : supported

INSERT_OP_FILE

Path of the configuration file of the preprocessing operator, for example, Aipp operator. For details about how to use the parameter, see Special Topics > AIPP.

This parameter is mutually exclusive with INPUT_FP16_NODES.

The configuration file path allows only letters, digits, and underscores (_). The file name can contain letters, digits, underscores (_), and periods (.).

The following is an example of the configuration file.

aipp_op {
aipp_mode:static
input_format:YUV420SP_U8
csc_switch:true
var_reci_chn_0:0.00392157
var_reci_chn_1:0.00392157
var_reci_chn_2:0.00392157
}
NOTE:

For details about the configuration file, see ATC Instructions.

Applicability:

Atlas 200/300/500 Inference Product : supported

Atlas Training Series Product : supported

PRECISION_MODE

Operator precision mode. This parameter cannot be used together with PRECISION_MODE_V2 in the same graph. You are advised to use PRECISION_MODE_V2.

Arguments:

  • force_fp32/cube_fp16in_fp32out:
    force_fp32 has the same effect as that of cube_fp16in_fp32out. The system selects a processing mode based on cube or vector operators. cube_fp16in_fp32out is newly added to the new version. For cube operators, this option has clearer semantics.
    • For cube operators, the system processes the computation based on the operator implementation.
      1. The preferred input data type is float16 and the output data type is float32.
      2. If the float16 input data and float32 output data types are not supported, set both the input and output data types to float32.
      3. If the float32 input and output data types are not supported, set both the input and output data types to float16.
      4. If the float16 input and output data types are not supported, an error is reported.
    • For vector operators, float32 is forcibly selected for operators supporting both float16 and float32, even if the original precision is float16.

      This argument is invalid if your model contains operators not supporting float32, for example, an operator that supports only float16. In this case, float16 is retained. If the operator does not support float32 and it is configured to the blocklist of precision reduction (by setting precision_reduce to false), the counterpart AI CPU operator supporting float32 is used. If the AI CPU operator is not supported, an error is reported.

  • force_fp16:

    Forces float16 for operators supporting both float16 and float32.

  • allow_fp32_to_fp16:
    • For cube operators, float16 is used.
    • For vector operators, preserve the original precision for operators supporting float32; else, forces float16.
  • must_keep_origin_dtype:

    Retain the original precision.

    • If the precision of an operator in the original graph is float16, and the implementation of the operator in the AI Core does not support float16 but supports only float32 and bfloat16, the system automatically uses high-precision float32.
    • If the precision of an operator in the original graph is float16, and the implementation of the operator in the AI Core does not support float16 but supports only bfloat16, the AI CPU operator of float16 is used. If the AI CPU operator is not supported, an error is reported.
    • If the precision of an operator in the original graph is float32, and the implementation of the operator in the AI Core does not support float32 but supports only float16, the AI CPU operator of float32 is used. If the AI CPU operator is not supported, an error is reported.
  • allow_mix_precision/allow_mix_precision_fp16:

    allow_mix_precision has the same effect as that of allow_mix_precision_fp16, indicating that mixed precision of float16 and float32 is used for neural network processing. allow_mix_precision_fp16 is newly added to the new version, which has clearer semantics for easy understanding.

    In this mode, float16 is automatically used for certain float32 operators based on the built-in tuning policies. This will improve system performance and reduce memory footprint with minimal accuracy degradation.

    If this mode is used, you can view the value of the precision_reduce option in the built-in tuning policy file ${INSTALL_DIR}/opp/built-in/op_impl/ai_core/tbe/config/<soc_version>/aic-<soc_version>-ops-info.json in the OPP installation directory.

    • If it is set to true, the operator is on the mixed precision trustlist and its precision will be reduced from float32 to float16.
    • If it is set to false, the operator is on the mixed precision blocklist and its precision will not be reduced from float32 to float16.
    • If an operator does not have the precision_reduce option configured, the operator is on the graylist and will follow the same precision processing as the upstream operator.

Default: force_fp16

Configuration example:

{ge::ir_option::PRECISION_MODE, "force_fp16"}

Applicability:

Atlas 200/300/500 Inference Product : supported

Atlas Training Series Product : supported

PRECISION_MODE_V2

Sets the precision mode of a model. This parameter cannot be used together with PRECISION_MODE in the same graph. You are advised to use PRECISION_MODE_V2.

Arguments:

  • fp16:

    Forces float16 for operators supporting both float16 and float32.

  • origin:

    Retain the original precision.

    • If the precision of an operator in the original graph is float16, and the implementation of the operator in the AI Core does not support float16 but supports only float32 and bfloat16, the system automatically uses high-precision float32.
    • If the precision of an operator in the original graph is float16, and the implementation of the operator in the AI Core does not support float16 but supports only bfloat16, the AI CPU operator of float16 is used. If the AI CPU operator is not supported, an error is reported.
    • If the precision of an operator in the original graph is float32, and the implementation of the operator in the AI Core does not support float32 but supports only float16, the AI CPU operator of float32 is used. If the AI CPU operator is not supported, an error is reported.
  • cube_fp16in_fp32out:
    The system selects a processing mode based on the operator type for operators supporting both float16 and float32.
    • For cube operators, the system processes the computation based on the operator implementation.
      1. The preferred input data type is float16 and the output data type is float32.
      2. If the float16 input data and float32 output data types are not supported, set both the input and output data types to float32.
      3. If the float32 input and output data types are not supported, set both the input and output data types to float16.
      4. If the float16 input and output data types are not supported, an error is reported.
    • For vector operators, float32 is forcibly selected for operators supporting both float16 and float32, even if the original precision is float16.

      This argument is invalid if your model contains operators not supporting float32, for example, an operator that supports only float16. In this case, float16 is retained. If the operator does not support float32 and it is configured to the blocklist of precision reduction (by setting precision_reduce to false), the counterpart AI CPU operator supporting float32 is used. If the AI CPU operator is not supported, an error is reported.

  • mixed_float16:

    Mixed precision of float16 and float32 is used for neural network processing. Computations are done in float16 for float32 operators according to the built-in tuning policies. This will improve system performance and reduce memory footprint with minimal accuracy degradation.

    If this mode is used, you can view the value of the precision_reduce option in the built-in tuning policy file ${INSTALL_DIR}/opp/built-in/op_impl/ai_core/tbe/config/<soc_version>/aic-<soc_version>-ops-info.json in the OPP installation directory.

    • If it is set to true, the operator is on the mixed precision trustlist and its precision will be reduced from float32 to float16.
    • If it is set to false, the operator is on the mixed precision blocklist and its precision will not be reduced from float32 to float16.
    • If an operator does not have the precision_reduce option configured, the operator is on the graylist and will follow the same precision processing as the upstream operator.
  • mixed_hif8: enables automatic mixed precision, indicating that hifloat8 (for details about this data type, see Link), float16, and float32 are used together to process the neural network. In this mode, hifloat8 is automatically used for certain float16 and float32 operators based on the built-in tuning policies. This will improve system performance and reduce memory footprint with minimal precision degradation. The current version does not support this option.

    If this mode is used, you can view the value of precision_reduce in the built-in tuning policy file ${INSTALL_DIR}/opp/built-in/op_impl/ai_core/tbe/config/<soc_version>/aic-<soc_version>-ops-info.json.

    • true: The operator is on the mixed precision trustlist and its precision will be reduced from float16/float32 to hifloat8.
    • false: The operator is on the mixed precision blocklist and its precision will not be reduced from float16/float32 to hifloat8.
    • If an operator does not have the precision_reduce option configured, the operator is on the graylist and will follow the same precision processing as the upstream operator.
  • cube_hif8: The hifloat8 data type is forcibly used if the Cube operator in the network model supports both hifloat8 and float16/float32. The current version does not support this option.

Default value: fp16

Configuration example:

{ge::ir_option::PRECISION_MODE_V2, "fp16"}

Applicability:

Atlas 200/300/500 Inference Product : supported

Atlas Training Series Product : supported

ALLOW_HF32

This parameter is reserved and is not supported in the current version.

Enables the function of automatically replacing the float32 data type with the HF32 data type. In the current version, this parameter takes effect only for Conv and Matmul operators.

HF32 is a single-precision floating-point type of Ascend for internal computation of operators. The following figure shows the comparison of HF32 with other common data types. HF32 shares the same value range with float32, but its mantissa precision (11 bits) is close to FP16 (10 bits). Replacing the original float32 single-precision data type with the HF32 single-precision data type by precision reduction can greatly reduce the space occupied by data and achieve performance improvement.

Arguments:

  • true: Enable the function of automatically converting the FP32 data type to the HF32 data type for Conv and Matmul operators.

    For details about the operators for which this function is enabled, see opp/built-in/op_impl/ai_core/tbe/impl_mode/allow_hf32_matmul_t_conv_t.ini in the file storage path after the CANN software is installed. This file cannot be modified by users.

  • false: Disable the function of automatically converting the FP32 data type to the HF32 data type for Conv and Matmul operators.

    For details about the operators for which this function is disabled, see opp/built-in/op_impl/ai_core/tbe/impl_mode/allow_hf32_matmul_f_conv_f.ini in the file storage path after the CANN software is installed. This file cannot be modified by users.

Default: Enable FP32-to-HF32 conversion for Conv operators; disable FP32-to-HF32 conversion for Matmul operators.

Restrictions:

  • For the same operator, if enable_hi_float_32_execution or enable_float_32_execution is configured using OP_PRECISION_MODE, you are not advised using this parameter together with ALLOW_HF32. If they are used together, the priority is as follows:

    OP_PRECISION_MODE(ByNodeName) > ALLOW_HF32 > OP_PRECISION_MODE(ByOpType)

  • ALLOW_HF32 automatically replaces float32 with HF32. To make this parameter take effect, ensure that the input or output type of the enabled operator is float32. The default value of PRECISION_MODE_V2 is fp16. If the operator type in the original network model is float32, the operator type is forcibly converted to float16. In this case, ALLOW_HF32 does not take effect. You are advised to change the value of PRECISION_MODE_V2 to origin. The default value of PRECISION_MODE is force_fp16, and you are advised to change the value to must_keep_origin_dtype or force_fp32.

Applicability:

Atlas 200/300/500 Inference Product : not supported

Atlas Training Series Product : not supported

EXEC_DISABLE_REUSED_MEMORY

Memory reuse enable.

Arguments:

  • 1: disabled If a network model is large and the memory reuse function is disabled, the memory may be insufficient during model conversion. As a result, the model build fails.
  • 0: enabled The default value is 0.

Configuration example:

{ge::ir_option::EXEC_DISABLE_REUSED_MEMORY, "0"}

Applicability:

Atlas 200/300/500 Inference Product : supported

Atlas Training Series Product : supported

OUTPUT_TYPE

Network output data type.

Arguments:

  • FP32: recommended for classification and object detection networks
  • UINT8: recommended for image super-resolution networks for better inference performance
  • FP16: recommended for classification and object detection networks. It is usually used when the output of one network is used as the input of another.
  • INT8

After the model compilation is complete, the preceding data types are displayed as DT_FLOAT, DT_UINT8, DT_INT8, or DT_FLOAT16 in the corresponding .om model file.

Configuration example:

{ge::ir_option::OUTPUT_TYPE, "PF32"}

Restrictions:

  • If no data type is specified, the data type of the operator output at the output layer of the original model applies.
  • If the data type is specified, the type specified by this parameter is used.

Applicability:

Atlas 200/300/500 Inference Product : supported

Atlas Training Series Product : supported

INPUT_FP16_NODES

(Required) Name of the input node that is of the float16 type.

The format is "node_name1;node_name2". Enclose the specified nodes in double quotation marks ("") and separate the nodes with semicolons (;). This parameter is mutually exclusive with INSERT_OP_FILE.

Configuration examples:

{ge::ir_option::INPUT_FP16_NODES, "node_name1;node_name2"}

Applicability:

Atlas 200/300/500 Inference Product : supported

Atlas Training Series Product : supported

LOG_LEVEL

Log level.

Arguments:

  • debug: debug, info, warning, and error logs
  • info: info, warning, and error logs
  • warning: warning and error logs
  • error: error logs
  • null (default): no debug logs

Configuration example:

{ge::ir_option::LOG_LEVEL, "debug"}

Applicability:

Atlas 200/300/500 Inference Product : supported

Atlas Training Series Product : supported

OP_COMPILER_CACHE_MODE

Disk cache mode for operator build.

Arguments:
  • enable: enabled. If it is enabled, operators with the same build configurations and operator configurations will not be built repeatedly, thus accelerating the build speed.
  • force: Enabled with cache forcibly refreshed. That is, the existing cache is cleared up before the operator is recompiled and added to the cache. For example, for Python changes, dependency library changes, or repository changes after operator optimization, you need to set this option to force to clear up the existing cache and then change it to enable to prevent the cache from being forcibly refreshed during each build.
  • disable (default): disabled.

Default: enable

Configuration example:

{ge::ir_option::OP_COMPILER_CACHE_MODE, "enable"}

Restrictions:

  • To specify the disk cache path for operator build, use this parameter together with OP_COMPILER_CACHE_DIR.
  • When you enable the operator build cache function, you can set the disk space of the cache folder with the configuration file (the op_cache.ini file automatically generated in the path specified by OP_COMPILER_CACHE_DIR after operator build) or environment variables.
    1. Using the op_cache.ini configuration file:

      If the op_cache.ini file does not exist, manually create it. Open the file and add the following information:

      # Configure the file format (required). The automatically generated file contains the following information by default. When manually creating a file, enter the following information:
      [op_compiler_cache]
      # Limit the drive space of the cache folder on a chip. The value must be an integer, in MB.
      max_op_cache_size=500
      # Set the ratio of the cache size to be reserved. The value range is [1,100], in percentage. For example, 80 indicates that when the cache space is insufficient, 80% of the cache space is reserved and the rest is cleared up.
      remain_cache_size_ratio=80    
      • The op_cache.ini file takes effect only when the values of max_op_cache_size and remain_cache_size_ratio in the preceding file are valid.
      • If the size of the build cache file exceeds the value of max_op_cache_size and the cache file is not accessed for more than half an hour, the cache file will be aged. (Operator build will not be interrupted due to the size of the build cache file exceeding the set limit. Therefore, if max_op_cache_size is set to a small value, the size of the actual build cache file may exceed the configured value.)
      • To disable the build cache aging function, set max_op_cache_size to -1. In this case, the access time is not updated when the operator cache is accessed, the operator build cache is not aged, and the default drive space is 500 MB.
      • If multiple users use the same cache path, you are advised to use the configuration file to set the cache path. In this scenario, the op_cache.ini file affects all users.
    2. Using environment variables

      In this scenario, the environment variable ASCEND_MAX_OP_CACHE_SIZE is used to limit the storage space of the cache folder of a chip. When the build cache space reaches the specified value and the cache file is not accessed for more than half an hour, the cache file is aged. The environment variable ASCEND_REMAIN_CACHE_SIZE_RATIO is used to set the ratio of the cache space to be reserved.

      A configuration example is provided as follows:

      # The ASCEND_MAX_OP_CACHE_SIZE environment variable defaults to 500, in MB. The value must be an integer.
      export ASCEND_MAX_OP_CACHE_SIZE=500
      # ASCEND_REMAIN_CACHE_SIZE_RATIO environment variable value range is [1,100]. The default value is 50, in percentage. For example, 80 indicates that 80% of the cache space is reserved when the cache space is insufficient.
      export ASCEND_REMAIN_CACHE_SIZE_RATIO=50
      • The argument configured through environment variables takes effect only for the current user.
      • To disable the build cache aging function, set the environment variable ASCEND_MAX_OP_CACHE_SIZE to -1. In this case, the access time is not updated when the operator cache is accessed, the operator build cache is not aged, and the default drive space is 500 MB.

    Caution: If both the op_cache.ini file and environment variable are configured, the configuration items in the op_cache.ini file are read first. If neither the op_cache.ini file nor the environment variable are configured, the system default values are read: 500 MB disk space and 50% reserved cache space.

  • If this parameter is set to force, the existing cache will be cleared. Therefore, it is not recommended for parallel program compilation. Otherwise, the cache used by other models may be cleared, causing compilation failures.
  • disable or force is recommended for publishing the final model.
  • If the repository changes after operator tuning, set this parameter to force to refresh the cache. Otherwise, the new tuning repository cannot be applied, and the tuning application fails to be executed.
  • When the debugging function is enabled:
    • If OP_DEBUG_LEVEL is set to a non-zero value, the OP_COMPILER_CACHE_MODE parameter configuration does not take effect, the operator build cache function is disabled, and all operators are recompiled.
    • If OP_DEBUG_CONFIG is not empty and OP_DEBUG_LIST is not configured, the OP_COMPILER_CACHE_MODE parameter configuration does not take effect, the operator build cache function is disabled, and all operators are recompiled.
    • If OP_DEBUG_CONFIG is not empty and OP_DEBUG_LIST is configured in the configuration file:
      • For operators in the list, ignore the configuration of OP_COMPILER_CACHE_MODE and continue to recompile them.
      • For operators out of the list, if OP_COMPILER_CACHE_MODE is set to enable or force, the cache function is enabled. If OP_COMPILER_CACHE_MODE is set to disable, the cache function is disabled and the operators are recompiled.

Applicability:

Atlas 200/300/500 Inference Product : supported

Atlas Training Series Product : supported

OP_COMPILER_CACHE_DIR

Disk cache directory for operator build.

Format: The directory can contain letters, digits, underscores (_), hyphens (-), and periods (.).

Defaults to $HOME/atc_data.

Configuration example:

{ge::ir_option::OP_COMPILER_CACHE_MODE, "enable"}
{ge::ir_option::OP_COMPILER_CACHE_DIR, "/home/test/data/atc_data"}

Restrictions:

  • To specify the disk cache path for operator build, use this option together with OP_COMPILER_CACHE_MODE.
  • If the specified directory exists and is valid, a kernel_cache subdirectory is automatically created. If the specified directory does not exist but is valid, the system automatically creates this directory and the kernel_cache subdirectory.
  • Do not store other self-owned content in the default cache directory. The self-owned content will be deleted together with the default cache directory during software package installation or upgrade.
  • The non-default cache directory specified by this parameter cannot be deleted. The directory will not be deleted during software package installation or upgrade.
  • In addition to OP_COMPILER_CACHE_DIR, the environment variable ASCEND_CACHE_PATH can be used to set the disk cache directory for operator build. The priorities of the configuration methods are as follows: OP_COMPILER_CACHE_DIR > ASCEND_CACHE_PATH > default storage path.

Applicability:

Atlas 200/300/500 Inference Product : supported

Atlas Training Series Product : supported

DEBUG_DIR

Directory of the debug-related process files generated during operator build, including the .o (operator binary file), .json (operator description file), and .cce files.

Defaults to the current directory.

Restrictions:

  • If you want to specify the path for storing the process file of operator build, use DEBUG_DIR and OP_DEBUG_LEVEL together. If OP_DEBUG_LEVEL is set to 0, DEBUG_DIR cannot be used.
  • In addition to DEBUG_DIR, the environment variable ASCEND_WORK_PATH can be used to set the path for storing the debugging file generated by operator build. The priorities of the configuration methods are as follows: DEBUG_DIR > ASCEND_WORK_PATH > default storage path.

Configuration example:

{ge::ir_option::OP_DEBUG_LEVEL, "1"}
{ge::ir_option::DEBUG_DIR, "/home/test/module/out_debug_info"}

Applicability:

Atlas 200/300/500 Inference Product : supported

Atlas Training Series Product : supported

OP_DEBUG_LEVEL

Operator debug enable.

  • 0 (default): Disables operator debug. The operator build folder kernel_meta is not generated in the current execution path.
  • 1: Enables operator debug. The kernel_meta folder is generated in the current execution path, and the .o file (operator binary file), .json file (operator description file), and TBE instruction mapping files (operator file *.cce and python-CCE mapping file *_loc.json) are generated in the folder for later analysis of AI Core errors.
  • 2: Enables operator debug. The kernel_meta folder is generated in the current execution path, and the .o file (operator binary file), .json file (operator description file), and TBE instruction mapping files (operator file *.cce and python-CCE mapping file *_loc.json) are generated in the folder for later analysis of AI Core errors. Setting this option to 2 also disables build optimization and enables the CCE compiler debug function (the CCE compiler option is set to -O0-g).
  • 3: Disables operator debug. The kernel_meta folder is generated in the current execution path, and the .o file (operator binary file) and .json file (operator description file) are generated in the folder. You can refer to these files when analyzing operator errors.
  • 4: Disables operator debug. The kernel_meta folder is generated in the current execution path, and the .o file (operator binary file), .json file (operator description file), TBE instruction mapping file (operator file *.cce), and UB fusion description file ({$kernel_name}_compute.json) are generated in the folder. These files can be used for problem reproduction and precision comparison during operator error analysis.
NOTICE:
  • If OP_DEBUG_LEVEL is set to 0 and OP_DEBUG_CONFIG is also set, the operator build directory kernel_meta is retained in the current execution path.
  • If OP_DEBUG_LEVEL is set to 0 and the NPU_COLLECT_PATH environment variable is set, the build directory kernel_meta is always retained. If the ASCEND_WORK_PATH environment variable is set, the build directory is retained in the path specified by the environment variable. If the ASCEND_WORK_PATH environment variable does not exist, the build directory is retained in the current execution path.
  • You are advised to set this parameter to 0 or 3 for training. To locate errors, set this parameter to 1 or 2, which might compromise the network performance.
  • If --op_debug_level is set to 2 (that is, CCEC compilation is enabled), the size of the operator kernel file (*.o file) increases. In the dynamic shape scenario, all possible shape scenarios are traversed during operator build, which may cause operator build failures due to large operator kernel files. In this case, do not enable the CCE compiler options.

    If a build failure is caused by the large operator kernel file, the following log is displayed:

    message:link error ld.lld: error: InputSection too large for range extension thunk ./kernel_meta_xxxxx.o
  • When the debug function is enabled, if the model contains the following MC2 operators, the *.o, *.json, and *.cce files of the operators are not generated in the kernel_meta directory.

    MatMulAllReduce

    MatMulAllReduceAddRmsNorm

    AllGatherMatMul

    MatMulReduceScatter

    AlltoAllAllGatherBatchMatMul

    BatchMatMulReduceScatterAlltoAll

Configuration example:

{ge::ir_option::OP_DEBUG_LEVEL, "1"}

Applicability:

Atlas 200/300/500 Inference Product : supported

Atlas Training Series Product : supported

MDL_BANK_PATH

Sets the directory of the custom repository generated after subgraph tuning.

This parameter must be used in pair with BUFFER_OPTIMIZE in aclgrphBuildInitialize Configuration Parameters and takes effect only when buffer optimization is enabled to improve performance by temporarily storing data in the buffer.

Argument: path of the custom repository after model tuning.

Format: The value can contain letters, digits, underscores (_), hyphens (-), and periods (.).

Default: $HOME/Ascend/latest/data/aoe/custom/graph/<soc_version>

Configuration example:

{ge::ir_option::MDL_BANK_PATH, "$HOME/custom_module_path"}

Restrictions:

Priority ranked from high to low: the directory specified by MDL_BANK_PATH > the directory specified by TUNE_BANK_PATH > the default directory.

  1. The custom repository directory specified by MDL_BANK_PATH takes effect and the directory specified by TUNE_BANK_PATH does not when TUNE_BANK_PATH is used to specify the directory before model compilation, and then MDL_BANK_PATH is used to specify the directory during model compilation.
  2. The default directory takes effect if both the directories specified by MDL_BANK_PATH and TUNE_BANK_PATH are invalid or contain no custom repository.
  3. If none of the preceding directories contains the custom repository, the system searches the built-in directory of the custom repository generated after subgraph tuning in ${INSTALL_DIR}/compiler/data/fusion_strategy/built-in.

Applicability:

Atlas 200/300/500 Inference Product : supported

Atlas Training Series Product : supported

OP_BANK_PATH

Path of the custom repository generated after operator tuning.

Format: The directory can contain letters, digits, underscores (_), hyphens (-), and periods (.).

Default: ${HOME}/Ascend/latest/data/aoe/custom/op

Configuration example:

{ge::ir_option::OP_BANK_PATH, "$HOME/custom_tune_path"}

Restrictions:

Priority ranked from high to low: the directory specified by TUNE_BANK_PATH > the directory specified by OP_BANK_PATH > the default directory of the custom repository generated after operator tuning.

  1. The custom repository directory specified by TUNE_BANK_PATH takes effect and the directory specified by OP_BANK_PATH does not when TUNE_BANK_PATH is used to specify the directory before model conversion, and then OP_BANK_PATH is used to specify the directory during model compilation.
  2. The default directory takes effect if both the directories specified by OP_BANK_PATH and TUNE_BANK_PATH are invalid.
  3. If none of the preceding directories contains the custom repository, the system searches the built-in directory of the custom repository generated after operator tuning.

Applicability:

Atlas 200/300/500 Inference Product : supported

Atlas Training Series Product : supported

MODIFY_MIXLIST

When mixed precision is enabled, you can use this parameter to specify the path and file name of the blocklist, trustlist, and graylist, and specify the operators that allow precision reduction and those that do not allow precision reduction. Set this parameter to the path and file name. The file is in JSON format.

For the blocklist, trustlist, and graylist, you can view the value of flag in the precision_reduce option in the built-in tuning policy file ${INSTALL_DIR}/opp/built-in/op_impl/ai_core/tbe/config/<soc_version>/aic-<soc_version>-ops-info.json.

  • true (trustlist): Precision reduction is allowed in mixed precision mode.
  • false (blocklist): Precision reduction is not allowed in mixed precision mode.
  • Not specified (graylist): Operators on the graylist follow the same precision processing as its upstream operator.
Configuration example:
{ge::ir_option::MODIFY_MIXLIST, "/home/test/ops_info.json"}

You can specify the operator type (or types separated by commas) in ops_info.json as follows.

{
  "black-list": {                  // Blocklist
     "to-remove": [                // Move an operator from the blocklist to the graylist.
     "Xlog1py"
     ],
     "to-add": [                   // Move an operator from the trustlist or graylist to the blocklist.
     "Matmul",
     "Cast"
     ]
  },
  "white-list": {                  // Trustlist
     "to-remove": [                // Move an operator from the trustlist to the graylist.
     "Conv2D"
     ],
     "to-add": [                   // Move an operator from the blocklist or graylist to the trustlist.
     "Bias"
     ]
  }
}

The operators in the preceding example configuration file are for reference only. The configuration should be based on the actual hardware environment and the built-in tuning strategies of the operators. To query the blocklist, trustlist, and graylist:

"Conv2D":{
    "precision_reduce":{
        "flag":"true"
},

true: trustlist; false: blocklist; Not configured: graylist.

Applicability:

Atlas 200/300/500 Inference Product : supported

Atlas Training Series Product : supported

OP_PRECISION_MODE

Sets the precision mode of one or more specified operators during internal processing. This parameter is used to transfer the customized precision mode configuration file op_precision.ini to set different precision modes for different operators.

The following precision modes can be set in the configuration file:

  • high_precision
  • high_performance
  • support_out_of_bound_index: indicates that the out-of-bounds verification is performed on the indices of the gather, scatter, and segment operators. The verification deteriorates the operator execution performance.
  • keep_fp16: The FP16 data type is used for internal processing of operators. In this scenario, the FP16 data type is not automatically converted to the FP32 data type. If the performance of FP32 computation does not meet the expectation and high precision is not required, you can select the keep_fp16 mode. This low-precision mode sacrifices the precision for improving the performance, which is not recommended.
  • super_performance: indicates ultra-high performance. Compared with high performance, the algorithm calculation formula is optimized.

You can view the precision or performance mode supported by an operator in the opp/built-in/op_impl/ai_core/tbe/impl_mode/all_ops_impl_mode.ini file in the file storage path with the CANN software installed.

Sample: Set the precision mode based on the operator type (low priority) or node name (high priority) in each row in the .ini file.

[ByOpType]
optype1=high_precision
optype2=high_performance
optype4=support_out_of_bound_index

[ByNodeName]
nodename1=high_precision
nodename2=high_performance
nodename4=support_out_of_bound_index

Restrictions:

  • This parameter is mutually exclusive with OP_SELECT_IMPL_MODE and OPTYPELIST_FOR_IMPLMODE. If they are all specified, OP_PRECISION_MODE takes precedence.
  • You are advised not to set this parameter. It is used if you need to adjust the precision of a specific operator using the configuration .ini file in the case that you fail to obtain optimal network performance or accuracy in the high-performance or high-precision mode.

Applicability:

Atlas 200/300/500 Inference Product : supported

Atlas Training Series Product : supported

SHAPE_GENERALIZED_BUILD_MODE

Sets the shape build mode during graph build. This parameter will be deprecated in later versions. Do not use this parameter for new functions.

  • shape_generalized: fuzzy compilation. The system generalizes the runtime dimensions of dynamic-shape operators before compilation.

    This parameter is used when you want to run multiple inferences based on one compilation.

  • shape_precise: precise compilation. The system directly performs compilation based on the specified shape without any escape operations.

Applicability:

Atlas 200/300/500 Inference Product : supported

Atlas Training Series Product : supported

CUSTOMIZE_DTYPES

Customized operator precision during model build. Other operators in the model are built according to PRECISION_MODE or PRECISION_MODE_V2. Set it to the path (including name of the configuration file), for example, /home/test/customize_dtypes.cfg.

Restrictions:

  • List the names or types of operators whose precision needs customization in the configuration file. Each operator occupies a line, and the operator type must be defined based on IR.
  • If both operator name and type are configured for an operator, the operator name applies during build.
  • The computing precision of an operator specified by this parameter does not take effect if the operator is fused during model compilation.

The structure of the configuration file is as follows:

# By operator name
Opname1::InputDtype:dtype1,dtype2,…OutputDtype:dtype1,…
Opname2::InputDtype:dtype1,dtype2,…OutputDtype:dtype1,…
# By operator type
OpType::TypeName1:InputDtype:dtype1,dtype2,…OutputDtype:dtype1,…
OpType::TypeName2:InputDtype:dtype1,dtype2,…OutputDtype:dtype1,…

Example:

# By operator name
resnet_v1_50/block1/unit_3/bottleneck_v1/Relu::InputDtype:float16,int8,OutputDtype:float16,int8
# By operator type
OpType::Relu:InputDtype:float16,int8,OutputDtype:float16,int8
NOTE:
  • You can find the operator precision support in the operator information library, which is saved in opp/op_impl/custom/ai_core/tbe/config/${soc_version}/aic-${soc_version}-ops-info.json under the CANN component directory by default.
  • The data type specified by this parameter takes high priority, which may invite accuracy or performance degradation. If the specified data type is not supported, the build will fail.

Applicability:

Atlas 200/300/500 Inference Product : supported

Atlas Training Series Product : supported

BUILD_INNER_MODEL

Not supported in the current version.

OP_DEBUG_CONFIG

Enable for global memory check.

The value is the path of the .cfg configuration file. Multiple options in the configuration file are separated by commas (,).
  • oom: Checks whether memory overwriting occurs in the global memory during operator execution.
    • Configuring this option retains the binary operator file (.o) and operator description file (.json) in the kernel_meta folder under the current execution directory during operator build.
    • If this option is used, the following detection logic is added during operator build. You can use the dump_cce option to view the following code in the generated .cce file:
      inline __aicore__ void  CheckInvalidAccessOfDDR(xxx) {
          if (access_offset < 0 || access_offset + access_extent > ddr_size) {
              if (read_or_write == 1) {
                  trap(0X5A5A0001);
              } else {
                  trap(0X5A5A0002);
              }
          }
      }

      During actual execution, if memory overwriting occurs, the error code EZ9999 is reported.

  • dump_bin: Retains the binary operator file (.o) and operator description file (.json) in the kernel_meta folder under the current execution directory during operator build.
  • dump_cce: Retains the operator CCE file (.cce), binary operator file (.o), and operator description file (.json) in the kernel_meta folder under the current execution directory during operator build.
  • dump_loc: Retains the Python-CCE mapping file (*_loc.json) in the kernel_meta folder under the current execution directory during operator build.
  • ccec_O0: Enables the CCEC option -O0 during operator build. This option does not optimize the debugging information for later analysis of AI Core errors.
  • ccec_g: Enables the CCEC option -g during operator build. This option optimizes the debugging information for later analysis of AI Core errors.
  • check_flag: Checks whether pipeline synchronization signals in operators match each other during operator execution.
    • Configuring this option retains the binary operator file (.o) and operator description file (.json) in the kernel_meta folder under the current execution directory during operator build.
    • If this option is used, the following detection logic is added during operator build. You can use the dump_cce option to view the following code in the generated .cce file:
        set_flag(PIPE_MTE3, PIPE_MTE2, EVENT_ID0);
        set_flag(PIPE_MTE3, PIPE_MTE2, EVENT_ID1);
        set_flag(PIPE_MTE3, PIPE_MTE2, EVENT_ID2);
        set_flag(PIPE_MTE3, PIPE_MTE2, EVENT_ID3);
        ....
        pipe_barrier(PIPE_MTE3);
        pipe_barrier(PIPE_MTE2);
        pipe_barrier(PIPE_M);
        pipe_barrier(PIPE_V);
        pipe_barrier(PIPE_MTE1);
        pipe_barrier(PIPE_ALL);
        wait_flag(PIPE_MTE3, PIPE_MTE2, EVENT_ID0);
        wait_flag(PIPE_MTE3, PIPE_MTE2, EVENT_ID1);
        wait_flag(PIPE_MTE3, PIPE_MTE2, EVENT_ID2);
        wait_flag(PIPE_MTE3, PIPE_MTE2, EVENT_ID3);
        ...

      During actual inference, if the pipeline synchronization signals in operators do not match each other, a timeout error is reported at the faulty operator, and the program is terminated. The following is an example of the error message:

      Aicore kernel execute failed, ..., fault kernel_name=operator name,...
      rtStreamSynchronizeWithTimeout execute failed....

Configuration example: /root/test0.cfg. The information about the test0.cfg file is as follows:

{ge::ir_option::OP_DEBUG_LEVEL, "1"}

Restrictions:

During operator compilation, if you want to compile only some instead of all AI Core operators, you need to add the OP_DEBUG_LIST field to the test0.cfg configuration file. By doing so, only the operators specified in the list are compiled, based on the options configured in OP_DEBUG_CONFIG. The OP_DEBUG_LIST field has the following requirements:

  • The operator name or operator type can be specified.
  • Operators are separated by commas (,). The operator type is configured in OpType::typeName format. The operator type and operator name can be configured in a mixed manner.
  • The operator to be compiled must be stored in the configuration file specified by OP_DEBUG_CONFIG.

Configuration example: Add the following information to the configuration file (for example, test0.cfg) specified by OP_DEBUG_CONFIG:

{ge::ir_option::OP_DEBUG_CONFIG, "ccec_g,oom"}
{ge::ir_option::OP_DEBUG_LIST, "GatherV2,opType::ReduceSum"}

During model compilation, the GatherV2,ReduceSum operator is compiled based on the ccec_g and oom options.

NOTE:
  • When ccec_O0 and ccec_g are enabled, the size of the operator kernel file (*.o file) increases. In dynamic shape scenarios, all possible scenarios are traversed during operator build, which may cause operator build failures due to large operator kernel files. In this case, do not enable the options of the CCE compiler.

    If the build failure is caused by the large operator kernel file, the following log is displayed:

    message:link error ld.lld: error: InputSection too large for range extension thunk ./kernel_meta_xxxxx.o:(xxxx)

  • The ccec_O0 and oom options of the CCEC cannot be both enabled. Otherwise, an AI Core error may be reported. The following is an example of the error message:
    ...there is an aivec error exception, core id is 49, error code = 0x4 ...
  • If the NPU_COLLECT_PATH environment variable is configured, the function of checking whether global memory overwriting occurs cannot be enabled (OP_DEBUG_CONFIG is set to oom). Otherwise, an error is reported when the compiled model file or operator kernel package is used.
  • When the build options oom, dump_bin, dump_cce, and dump_loc are configured, if the model contains the following MC2 operators, the *.o, *.json, and *.cce files of the operators are not generated in the kernel_meta directory.

    MatMulAllReduce

    MatMulAllReduceAddRmsNorm

    AllGatherMatMul

    MatMulReduceScatter

    AlltoAllAllGatherBatchMatMul

    BatchMatMulReduceScatterAlltoAll

Applicability:

Atlas 200/300/500 Inference Product : supported

Atlas Training Series Product : supported

EXTERNAL_WEIGHT

Externalizes the weights of the Const/Constant nodes on the network and converts the weights to FileConstant when the OM model file is generated.

In the offline scenario, if the model weight is large and the environment has restrictions on the .om file size, you are advised to enable the external weight to save the weight separately to reduce the .om file size.

Arguments:

  • 0: Saves the weights in the .om model file. The default value is 0.
  • 1: externalizes the weights and flushes the weight files of all Const/Constant nodes on the network. The node type is converted to FileConstant. The weight files are named as weight_+hash.

Configuration example:

{ge::ir_option::EXTERNAL_WEIGHT, "1"}

Restrictions:

  • In the external weight scenario, when AscendCL APIs are used to develop inference applications and load models:
    • Use the aclgrphSaveModel API to save the OM model.
      • If the aclmdlLoadFromFile API is used to load a model, the weight file must be stored in the weight directory at the same level as the .om file.
      • If the aclmdlSetConfigOpt and aclmdlLoadWithConfig APIs are used to load a model, there is no requirement on the external weight directory. When the model is loaded later, use the aclmdlLoadWithConfig API to specify the external weight directory.
    • In the weight update scenario, use the aclgrphBundleSaveModel API to save the OM model.

      Only the aclmdlBundleLoadFromFile API can be used to load a model, and the weight file must be stored in the weight directory at the same level as the .om file.

    For details about the APIs, see "Model Loading and Unloading".

Applicability:

Atlas 200/300/500 Inference Product : supported

Atlas Training Series Product : supported

EXCLUDE_ENGINES

Prevents the network model from using one or more acceleration engines. Use vertical bars (|) to separate multiple engines.

The NPU integrates multiple hardware accelerators (also called acceleration engines), such as AiCore, AiVec, and AiCpu (sorted by priority). During graph compilation, an appropriate engine is selected for an operator based on the priority. Specifically, when an operator is supported by multiple engines, the one with a higher priority is selected.

EXCLUDE_ENGINES can exclude engines for operators. For example, during a training process, to prevent the data preprocessing graph and the main training graph from preempting AiCore, you can configure this parameter to prevent the data preprocessing graph from using the AiCore engine.

Arguments:

AiCore: AI Core hardware acceleration engine

AiVec: Vector Core hardware acceleration engine

AiCpu: AI CPU hardware acceleration engine

Configuration example:

{ge::ir_option::EXCLUDE_ENGINES, "AiCore|AiVec"}

Applicability:

Atlas 200/300/500 Inference Product : supported

Atlas Training Series Product : supported

DISTRIBUTED_CLUSTER_BUILD

Applicable to the distributed compilation and partition of foundation models.

Enables distributed compilation and partition of a foundation model. If this parameter is enabled, the generated offline model will be used for distributed deployment. 1: enabled; empty or other values: disabled.

Example:

{ge::ir_option::DISTRIBUTED_CLUSTER_BUILD, "1"}

Applicability:

Atlas Training Series Product : supported

Atlas 200/300/500 Inference Product : not supported

ENABLE_GRAPH_PARALLEL

Applicable to the distributed compilation and partition of foundation models.

Indicates whether to automatically partition the original model. 1: enabled; empty or other values: disabled.

The automatic partition function can be enabled only after distributed build is enabled by DISTRIBUTED_CLUSTER_BUILD. The original model is automatically partitioned based on the requirements in the GRAPH_PARALLEL_OPTION_PATH file.

Example:

{ge::ir_option::ENABLE_GRAPH_PARALLEL, "1"}

Applicability:

Atlas Training Series Product : supported

Atlas 200/300/500 Inference Product : not supported

GRAPH_PARALLEL_OPTION_PATH

Applicable to the distributed compilation and partition of foundation models.

Specifies the path and name of the algorithm-based partitioning policy configuration file when the original foundation model is partitioned.

The path of the partitioning strategy configuration file can be configured only after both DISTRIBUTED_CLUSTER_BUILD and ENABLE_GRAPH_PARALLEL are enabled.

Example:

{ge::ir_option::GRAPH_PARALLEL_OPTION_PATH, "./parallel_option.json"}

The specified configuration file must be in JSON format. The following is an example:

  • Semi-automatic partitioning
    {
        "graph_parallel_option": {
            "auto": false,
            "opt_level": "O1"
            "tensor_parallel_option": {
                "tensor_parallel_size": 2
            },
            "tensor_sharding":{
              "optimizer_state_sharding": true, 
              "gradient_sharding":true, 
              "model_weight_sharding": true,
              "model_weight_prefetch": true,
              "model_weight_prefetch_buffer_size": 50
    		}
        }
    }
  • Automatic partitioning
    {
        "graph_parallel_option": {
            "auto": true
        }
    }

Argument description:

  • auto: true for automatic partition; false for semi-automatic partition.
  • opt_level: Tensor Parallel solution algorithm. The value can be O2 (ILP algorithm) or O1 (DP algorithm). If this parameter is not set, O2 is used by default.
  • tensor_parallel_option: enables TP partition.

    TP partition: Tensor Parallel, also called Intra-Op Parallel, partitions the tensor of each operator in a computational graph along one or more axes (batch/non-batch). The divided partitions are distributed to each device for computation.

  • tensor_parallel_size: TP size, that is, the number of device chips to be configured. The value of this parameter must be the same as that in the topology file specified by CLUSTER_CONFIG in aclgrphBuildInitialize Configuration Parameters.
  • optimizer_state_sharding: enables optimizer sharding. true: enabled; false: disabled.
  • gradient_sharding: enables gradient sharding. true: enabled; false: disabled.
  • model_weight_sharding: enables weight sharding. true: enabled; false: disabled.
  • model_weight_prefetch: enables weight prefetching. true: enabled; false: disabled.
  • model_weight_prefetch_buffer_size: specifies the cache size for weight prefetching.

Applicability:

Atlas Training Series Product : supported

Atlas 200/300/500 Inference Product : not supported

MODEL_RELATION_CONFIG

Applicable to the distributed compilation and partition of foundation models.

Sets the configuration file and path that express data associations and distributed communication group relationships between multiple slice models. This parameter applies to scenarios where the original model is a slice model and the slicing model contains communication operators.

This parameter takes effect only after DISTRIBUTED_CLUSTER_BUILD is enabled.

Example:

{ge::ir_option::MODEL_RELATION_CONFIG, "./model_relation.json"}

The configuration file must be in JSON format. The following is an example:

{
  "deploy_config" :[                    // (Required) Mapping between the model deployment and the target deployment node.
    {
      "submodel_name":"submodel1.air",  // File name after partition at the frontend, which must be the same as the graph name.
      "deploy_device_id_list":"0:0:0"   // Target device to be deployed for the model: cluster: 0 node: 0 item: 0
    },
    {
      "submodel_name":"submodel2.air",
      "deploy_device_id_list":"0:0:1"
    }
  ],
  "model_name_to_instance_id":[          // Required
    {
      "submodel_name":"submodel1.air",   // Model ID, which is specified by users in the file. Different files correspond to different IDs.
      "model_instance_id":0
    },
    {
      "submodel_name":"submodel2.air",
      "model_instance_id":1
    }
  ],
  "comm_group":[{                      // Optional. If the model partitioned at the frontend contains a communication operator, this parameter indicates the communication domain information of the communication operator after the partition.
    "group_name":"tp_group_name_0",    // Sub-communication domain of the communication operator after model partition at the frontend.
    "group_rank_list":"[0,1]"          // Subrank list of the communication operator after model partition at the frontend.
  }],
  "rank_table":[
  {
    "rank_id":0,                      // Mapping between rank IDs and model IDs
    "model_instance_id":0
  },
  {
    "rank_id":1,
    "model_instance_id":1
  }
  ]
}

Applicability:

Atlas Training Series Product : supported

Atlas 200/300/500 Inference Product : not supported

AC_PARALLEL_ENABLE

Whether to allow AI CPU operators and AI Core operators to run in parallel in a dynamic-shape graph.

In a dynamic-shape graph, when this function is enabled, the system automatically identifies AI CPU operators that can be run in parallel with the AI Core operators in the graph. Operators of different engines are distributed to different streams to run in parallel, improving resource utilization and dynamic shape execution performance.

Arguments:

  • 1: AI CPU operators and AI Core operators are allowed to run in parallel.
  • 0 (default): AI CPU operators are not separately distributed.

Configuration example:

{ge::ir_option::AC_PARALLEL_ENABLE, "1"}

Applicability:

Atlas Training Series Product : supported

Atlas 200/300/500 Inference Product : not supported

QUANT_DUMPABLE

Collects the dump data of the quantization operator.

For details, see Accuracy Improvement Suggestions for Model Inference in CANN AscendCL Application Software Development Guide (C&C++). During precision locating, if there is a model after AMCT quantization, the input and output of the quantization operators may be optimized during graph build when the model is converted to an OM offline model, affecting the dump data export of the quantization operators. For example: For two quantized convolution calculations, the intermediate output is optimized to the quantized output of int8.

To solve this problem, the QUANT_DUMPABLE parameter is introduced. After this parameter is enabled, the input and output of the quantization operator are not fused. The transdata operator is inserted to restore the original model format. In this way, the dump data of the quantization operator can be collected.

Arguments:

  • 0: The input and output of the quantization operator may be optimized during graph compilation. In this case, the dump data of the quantization operator cannot be obtained. The default value is 0.
  • 1: After this function is enabled, the dump data of the quantization operator can be collected.

Configuration example:

{ge::ir_option::QUANT_DUMPABLE, "1"}

Applicability:

Atlas 200/300/500 Inference Product : supported

Atlas Training Series Product : supported

TILING_SCHEDULE_OPTIMIZE

Tiling offload scheduling optimization.

As internal storage of the AI Core in the NPU cannot store all the input and output data of operators, the input data is tiled into different parts. The first part is transferred in, computed, and then transferred out, so does the next part. This process is called tiling. Then, a computation program, called tiling implementation, determines tiling parameters (such as the block size transferred each time and the total number of cycles) based on operator information such as shape. The AI Core is not good at scalar computation in the tiling implementation. Therefore, tiling implementation is generally executed on the CPU on the host. However, tiling implementation is executed on the device when the following conditions are met:

  1. The model is static-shape.
  2. Operators in the model, such as the FusedInferAttentionScore and IncreFlashAttention fused operators, support tiling offload.
  3. The value of the operator that supports tiling offloading depends on the execution result of the previous operator.

Arguments:

  • 0 (default): Tiling offload is disabled.
  • 1: Tiling offload is enabled.

Configuration example:

{ge::ir_option::TILING_SCHEDULE_OPTIMIZE, "1"}

Applicability:

Atlas Training Series Product : not supported

Atlas 200/300/500 Inference Product : not supported

OPTION_EXPORT_COMPILE_STAT

Whether to generate the result file fusion_result.json of operator fusion information (including graph fusion and UB fusion) during graph build. This parameter is reserved.

This file is used to record the fusion patterns used during graph build. In the file:

  • session_and_graph_id_ xx_xx: Thread and graph ID to which the fusion result belongs.
  • graph_fusion: Graph fusion.
  • ub_fusion: UB fusion.
  • match_times: Number of times that a fusion pattern is hit during graph build.
  • effect_times: Number of times that a fusion pattern takes effect.
  • repository_hit_times: Number of times that the repository is hit during UB fusion.

Arguments:

  • 0: The result file of operator fusion information is not generated.
  • 1 (default): The result file of operator fusion information is generated when the program exits normally.
  • 2: The result file of operator fusion information is generated after graph build. That is, if the graph build is complete and the subsequent program is interrupted in advance, the result file of operator fusion information is also generated.
NOTE:
  • If the ASCEND_WORK_PATH environment variable is not set, the result file is generated in the current path where the atc command is executed by default. If the ASCEND_WORK_PATH environment variable is set, the result file fusion_result.json is saved in $ASCEND_WORK_PATH/FE/${Process ID}.
  • The fusion patterns disabled using FUSION_SWITCH_FILE are not displayed in the fusion_result.json file.

Configuration example:

{ge::ir_option::OPTION_EXPORT_COMPILE_STAT, "1"}

Applicability

Atlas Training Series Product: supported

Atlas 200/300/500 Inference Product: supported