aclCompileOpt

typedef enum {
    ACL_PRECISION_MODE,           // Operator precision mode of a network model.
    ACL_AICORE_NUM,               //Sets the number of AI Cores used for model compilation.
    ACL_AUTO_TUNE_MODE,          //Sets the operator auto tuning mode.
    ACL_OP_SELECT_IMPL_MODE,      //Sets the operator implementation mode.
    ACL_OPTYPELIST_FOR_IMPLMODE, //Lists operator types. Operators in the list are implemented in the mode specified by ACL_OP_SELECT_IMPL_MODE.
    ACL_OP_DEBUG_LEVEL,          //Enables TBE operator debug during operator compilation.
    ACL_DEBUG_DIR,               //Sets the debug directory, for saving the files generated during model conversion and network migration, including the .o, .json, and .cce files of operators.
    ACL_OP_COMPILER_CACHE_MODE,  //Sets the disk cache mode for operator compilation.
    ACL_OP_COMPILER_CACHE_DIR      //Sets the disk cache directory for operator compilation.
    ACL_OP_PERFORMANCE_MODE,      //Determines whether to compile operators in high-performance mode.
    ACL_OP_JIT_COMPILE,           //Determines whether to compile an operator online or use a compiled operator binary file.
    ACL_OP_DETERMINISTIC,         //Determines whether to enable deterministic computing.
    ACL_CUSTOMIZE_DTYPES,          //Customizes the computation precision of one or more operators during model building.
    ACL_OP_PRECISION_MODE,        //Precision mode for internal operator processing. One or more operators can be specified.
    ACL_ALLOW_HF32                //hf32 is a precision type of Ascend for the internal computation of operators. It is not supported in the current version.
    ACL_PRECISION_MODE_V2,        // Operator precision mode of a network model. Compared with ACL_PRECISION_MODE, ACL_PRECISION_MODE_V2 is added in the new version. There are more precision modes, and the semantics of the original precision mode options are clearer, facilitating understanding.
    ACL_OP_DEBUG_OPTION           // Currently, this parameter can only be set to oom, indicating that global memory out-of-bounds access detection is enabled.
} aclCompileOpt;
Table 1 Build options

Build Option

Description

ACL_PRECISION_MODE

Sets the operator precision mode of a network model. If it is not specified, allow_fp32_to_fp16 is used by default.

  • force_fp32/cube_fp16in_fp32out:
    force_fp32 has the same effect as that of cube_fp16in_fp32out. The system selects a processing mode based on cube or vector operators. cube_fp16in_fp32out is newly added to the new version. For cube operators, this option has clearer semantics.
    • For cube operators, the system processes the computation based on the operator implementation.
      1. The preferred input data type is float16 and the output data type is float32.
      2. If the float16 input data and float32 output data types are not supported, set both the input and output data types to float32.
      3. If the float32 input and output data types are not supported, set both the input and output data types to float16.
      4. If the float16 input and output data types are not supported, an error is reported.
    • For vector operators, float32 is forcibly selected for operators supporting both float16 and float32, even if the original precision is float16.

      This argument is invalid if your model contains operators not supporting float32, for example, an operator that supports only float16. In this case, float16 is retained. If the operator does not support float32 and it is configured to the blocklist of precision reduction (by setting precision_reduce to false), the counterpart AI CPU operator supporting float32 is used. If the AI CPU operator is not supported, an error is reported.

  • force_fp16:

    Forces float16 for operators supporting both float16 and float32.

  • allow_fp32_to_fp16:
    • For cube operators, float16 is used.
    • For vector operators, preserve the original precision for operators supporting float32; else, forces float16.
  • must_keep_origin_dtype:

    Retain the original precision.

    • If the precision of an operator in the original graph is float16, and the implementation of the operator in the AI Core does not support float16 but supports only float32 and bfloat16, the system automatically uses high-precision float32.
    • If the precision of an operator in the original graph is float16, and the implementation of the operator in the AI Core does not support float16 but supports only bfloat16, the AI CPU operator of float16 is used. If the AI CPU operator is not supported, an error is reported.
    • If the precision of an operator in the original graph is float32, and the implementation of the operator in the AI Core does not support float32 but supports only float16, the AI CPU operator of float32 is used. If the AI CPU operator is not supported, an error is reported.
  • allow_mix_precision/allow_mix_precision_fp16:

    allow_mix_precision has the same effect as that of allow_mix_precision_fp16, indicating that mixed precision of float16 and float32 is used for neural network processing. allow_mix_precision_fp16 is newly added to the new version, which has clearer semantics for easy understanding.

    In this mode, float16 is automatically used for certain float32 operators based on the built-in tuning policies. This will improve system performance and reduce memory footprint with minimal accuracy degradation.

    If this mode is used, you can view the value of the precision_reduce option in the built-in tuning policy file ${INSTALL_DIR}/opp/built-in/op_impl/ai_core/tbe/config/<soc_version>/aic-<soc_version>-ops-info.json in the OPP installation directory.

    • If it is set to true, the operator is on the mixed precision trustlist and its precision will be reduced from float32 to float16.
    • If it is set to false, the operator is on the mixed precision blocklist and its precision will not be reduced from float32 to float16.
    • If an operator does not have the precision_reduce option configured, the operator is on the graylist and will follow the same precision processing as the upstream operator.

ACL_AICORE_NUM

Sets the number of AI Cores used for model compilation.

The setting in the current version is invalid.

ACL_AUTO_TUNE_MODE

Do not set this parameter because it will be deprecated. Otherwise, compatibility issues may occur in later versions. If tuning is involved, see AOE Instructions.

Sets the operator auto tuning mode.

  • GA: genetic algorithm, for tuning Cube operators.
  • RL: reinforcement learning, for tuning Vector operators.

ACL_OP_SELECT_IMPL_MODE

Sets the operator implementation mode. If it is not specified, high_precision is used by default.

  • high_precision: High-precision implementation mode.

    This option sets the operator implementation mode by using the built-in configuration file, which is stored in ${INSTALL_DIR}/opp/op_impl/built-in/ai_core/tbe/impl_mode/high_precision.ini.

    To ensure compatibility, this argument takes effect only for the operator list in the high_precision.ini file. This list can be used to control the effective scope of operators and ensure that the network models of earlier versions are not affected.

  • high_performance (default): High-performance implementation mode.

    This option sets the operator implementation mode by using the built-in configuration file, which is stored in ${INSTALL_DIR}/opp/built-in/op_impl/ai_core/tbe/impl_mode/high_performance.ini.

    To ensure compatibility, this argument takes effect only for the operator list in the high_performance.ini file. This list can be used to control the effective scope of operators and ensure that the network models of earlier versions are not affected.

  • high_precision_for_all: High-precision mode.

    This option sets the operator implementation mode by using the built-in configuration file, which is stored in ${INSTALL_DIR}/opp/built-in/op_impl/ai_core/tbe/impl_mode/high_precision_for_all.ini. The list in this file may be updated with the version.

    This implementation mode may cause incompatibility. If an operator in the new software package sets the implementation mode (that is, an implementation mode is added for a certain operator in the configuration file), the performance of the earlier network model that uses the high_precision_for_all mode may deteriorate.

  • high_performance_for_all: High-performance mode.

    This option sets the operator implementation mode by using the built-in configuration file, which is stored in ${INSTALL_DIR}/opp/built-in/op_impl/ai_core/tbe/impl_mode/high_performance_for_all.ini. The list in this file may be updated with the version.

    This implementation mode may cause incompatibility. If an operator in the new software package sets the implementation mode (that is, an implementation mode is added for a certain operator in the configuration file), the precision of the earlier network model that uses the high_performance_for_all mode may deteriorate.

ACL_OPTYPELIST_FOR_IMPLMODE

Sets the operator type list (multiple operators are separated by commas). This option is used in pair with ACL_OP_SELECT_IMPL_MODE.

ACL_OP_DEBUG_LEVEL

Enables TBE operator debug during operator compilation.

The options are as follows:

  • 0 (default): Disables operator debug. The operator build folder kernel_meta is not generated in the current execution path.
  • 1: Enables operator debug. The kernel_meta folder is generated in the current execution path, and the .o file (operator binary file), .json file (operator description file), and TBE instruction mapping files (operator file *.cce and python-CCE mapping file *_loc.json) are generated in the folder for later analysis of AI Core errors.
  • 2: Enables operator debug. The kernel_meta folder is generated in the current execution path, and the .o file (operator binary file), .json file (operator description file), and TBE instruction mapping files (operator file *.cce and python-CCE mapping file *_loc.json) are generated in the folder for later analysis of AI Core errors. Setting this option to 2 also disables build optimization and enables the CCE compiler debug function (the CCE compiler option is set to -O0-g).
  • 3: Disables operator debug. The kernel_meta folder is generated in the current execution path, and the .o file (operator binary file) and .json file (operator description file) are generated in the folder. You can refer to these files when analyzing operator errors.
  • 4: Disables operator debug. The kernel_meta folder is generated in the current execution path, and the .o file (operator binary file), .json file (operator description file), TBE instruction mapping file (operator file *.cce), and UB fusion description file ({$kernel_name}_compute.json) are generated in the folder. These files can be used for problem reproduction and precision comparison during operator error analysis.

The configuration constraints are as follows:

  • If --op_debug_level is set to 2 (that is, CCEC compilation is enabled), the size of the operator kernel file (*.o file) increases. In the dynamic shape scenario, all possible shape scenarios are traversed during operator build, which may cause operator build failures due to large operator kernel files. In this case, do not enable the CCE compiler options.

    If a build failure is caused by the large operator kernel file, the following log is displayed:

    message:link error ld.lld: error: InputSection too large for range extension thunk ./kernel_meta_xxxxx.o
  • When the debug function is enabled, if the model contains the following MC2 operators, the *.o, *.json, and *.cce files of the operators are not generated in the kernel_meta directory.

    MatMulAllReduce

    MatMulAllReduceAddRmsNorm

    AllGatherMatMul

    MatMulReduceScatter

    AlltoAllAllGatherBatchMatMul

    BatchMatMulReduceScatterAlltoAll

ACL_DEBUG_DIR

Sets the path (defaults to current path of the executed application/kernel_meta) for storing debugging information files generated after operator compilation during model conversion and network migration, including the .o, .json, and .cce files. The generated files depend on the value of ACL_OP_DEBUG_LEVEL.

The directory can contain letters, digits, underscores (_), hyphens (-), and periods (.).

NOTE:

In addition to setting enumerated values, you can also configure the cache directory for operator compilation files by setting the environment variable ASCEND_WORK_PATH. The priorities of these methods are as follows: Setting enumerated values > Setting environment variables > Default directory.

For details about how to set environment variables, see Environment Variables.

ACL_OP_COMPILER_CACHE_MODE

Sets the disk cache mode for operator compilation. This option must be used in conjunction with ACL_OP_COMPILER_CACHE_DIR.

  • enable: enabled. If it is enabled, operators with the same build configurations and operator configurations will not be built repeatedly, thus accelerating the build speed.
  • force: Enabled with cache forcibly refreshed. That is, the existing cache is cleared up before the operator is recompiled and added to the cache. For example, for Python changes, dependency library changes, or repository changes after operator optimization, you need to set this option to force to clear up the existing cache and then change it to enable to prevent the cache from being forcibly refreshed during each build.
  • disable (default): disabled.

If debugging is also enabled (ACL_OP_DEBUG_LEVEL is set to a non-zero value), the system ignores the configuration of ACL_OP_COMPILER_CACHE_MODE and does not cache the build result of debugging.

When you enable the operator build cache function, you can set the disk space of the cache folder with the configuration file (the op_cache.ini configuration file is automatically generated in the path specified by ACL_OP_COMPILER_CACHE_DIR during operator building) or environment variables.

  1. Using the op_cache.ini configuration file:

    If the op_cache.ini file does not exist, manually create it. Open the file and add the following information:

    # Configure the file format (required). The automatically generated file contains the following information by default. When manually creating a file, enter the following information:
    [op_compiler_cache]
    # Limit the drive space of the cache folder on a chip. The value must be an integer, in MB.
    max_op_cache_size=500
    # Set the ratio of the cache size to be reserved. The value range is [1,100], in percentage. For example, 80 indicates that when the cache space is insufficient, 80% of the cache space is reserved and the rest is cleared up.
    remain_cache_size_ratio=80    
    • The op_cache.ini file takes effect only when the values of max_op_cache_size and remain_cache_size_ratio in the preceding file are valid.
    • If the size of the build cache file exceeds the value of max_op_cache_size and the cache file is not accessed for more than half an hour, the cache file will be aged. (Operator build will not be interrupted due to the size of the build cache file exceeding the set limit. Therefore, if max_op_cache_size is set to a small value, the size of the actual build cache file may exceed the configured value.)
    • To disable the build cache aging function, set max_op_cache_size to -1. In this case, the access time is not updated when the operator cache is accessed, the operator build cache is not aged, and the default drive space is 500 MB.
    • If multiple users use the same cache path, you are advised to use the configuration file to set the cache path. In this scenario, the op_cache.ini file affects all users.
  2. Using environment variables

    In this scenario, the environment variable ASCEND_MAX_OP_CACHE_SIZE is used to limit the storage space of the cache folder of a chip. When the build cache space reaches the specified value and the cache file is not accessed for more than half an hour, the cache file is aged. The environment variable ASCEND_REMAIN_CACHE_SIZE_RATIO is used to set the ratio of the cache space to be reserved.

    A configuration example is provided as follows:

    # The ASCEND_MAX_OP_CACHE_SIZE environment variable defaults to 500, in MB. The value must be an integer.
    export ASCEND_MAX_OP_CACHE_SIZE=500
    # ASCEND_REMAIN_CACHE_SIZE_RATIO environment variable value range is [1,100]. The default value is 50, in percentage. For example, 80 indicates that 80% of the cache space is reserved when the cache space is insufficient.
    export ASCEND_REMAIN_CACHE_SIZE_RATIO=50
    • The argument configured through environment variables takes effect only for the current user.
    • To disable the build cache aging function, set the environment variable ASCEND_MAX_OP_CACHE_SIZE to -1. In this case, the access time is not updated when the operator cache is accessed, the operator build cache is not aged, and the default drive space is 500 MB.

Caution: If both the op_cache.ini file and environment variable are configured, the configuration items in the op_cache.ini file are read first. If neither the op_cache.ini file nor the environment variable are configured, the system default values are read: 500 MB disk space and 50% reserved cache space.

ACL_OP_COMPILER_CACHE_DIR

Sets the cache directory of operator compilation files. The default directory is $HOME/atc_data. This option must be used in conjunction with ACL_OP_COMPILER_CACHE_MODE.

The path can contain only letters, digits, underscores (_), hyphens (-), and periods (.).

If the ACL_OP_DEBUG_LEVEL option is set, the compilation cache function is enabled only when this option is set to 0.

NOTE:

In addition to setting enumerated values, you can also configure the cache directory for operator compilation files by setting the environment variable ASCEND_CACHE_PATH. The priorities of these methods are as follows: Setting enumerated values > Setting environment variables > Default directory.

For details about how to set environment variables, see Environment Variables.

ACL_OP_PERFORMANCE_MODE

Do not set this parameter because it has been deprecated. Otherwise, compatibility issues may occur in later versions.

Sets the performance mode (high-performance) for operator compilation. Defaults to normal.

Selected from:

  • normal: The operator is compiled with the optimal compilation performance.
  • high: The operator is compiled with the highest runtime performance using the generalization strategy.

ACL_OP_JIT_COMPILE

Determines whether to compile an operator online or use the binary file of a compiled operator.

  • enable: Operators are compiled online. The system performs tuning based on the obtained operator information to get better performing operators. In the static-shape network scenario, you are advised to set this parameter to enable.
  • disable: The compiled operator binary file in the system is preferentially searched. If the file can be found, operators are not compiled anymore, which produces better compilation performance. If the file cannot be found, operators will be compiled. In the dynamic-shape network scenario, you are advised to set this parameter to disable. If this parameter is disabled, you need to install the operator binary file package. For details, see "Common Operations > Installing, Upgrading, and Uninstalling the Binary OPP" in CANN Software Installation Guide.

For the Atlas 200/300/500 Inference Product , the default value is enable.

For the Atlas Training Series Product , the default value is enable.

ACL_OP_DETERMINISTIC

Enables or disables deterministic computing.

  • 0 (default): disables deterministic computing. In this case, the results of multiple executions of an operator with the same hardware and input may be different. This is generally caused by asynchronous multi-thread executions during operator implementation, which changes the accumulation sequence of floating point numbers.
  • 1: enables deterministic computing. In this case, the results of multiple executions of an operator with the same hardware and input will be the same. However, enabling deterministic computing often slows down operator execution.

You are advised not to enable deterministic computing because it slows down operator execution and affects performance. If the execution results of a model are different for multiple times or the model accuracy needs to be tuned, you can enable deterministic computing to assist model debugging and tuning.

ACL_CUSTOMIZE_DTYPES

Sets the path (including the file name) of the *.cfg configuration file, which lists the names or types of operators whose calculation precisions need to be specified. Each operator occupies a line. With this configuration, you can customize the calculation precision of one or more operators during model compilation.

Configuration constraints:

  • The path (including the file name) can contain letters, digits, underscores (_), hyphens (-), periods (.), and colons (:).
  • To specify operator names in the configuration file, follow the Opname::InputDtype:dtype1,...,OutputDtype:dtype1,... format. Put each operator name in a separate line. dtype1, dtype2, and more data types must correspond to the inputs and outputs of the custom operators.
  • To specify operator types in the configuration file, follow the OpType::TypeName:InputDtype:dtype1,...,OutputDtype:dtype1,... format. Put each operator type in a separate line. dtype1, dtype2, and more data types must correspond to the inputs and outputs of the custom operators. The operator type must be OpType of the Ascend IR–defined operator. For details about OpType, see Operator Acceleration Library API Reference.
  • For the same operator, if both Opname and OpType are configured, the Opname configuration is used during building.
  • The computing precision of an operator specified by this option does not take effect if the operator is fused during model conversion.

ACL_OP_PRECISION_MODE

Path and file name of the configuration file (.ini format) for setting the operator precision mode. The path and file name can contain letters, digits, underscores (_), hyphens (-), and periods (.).

  • The following precision modes can be set in the configuration file:
    • high_precision
    • high_performance
    • support_out_of_bound_index: indicates that the out-of-bounds verification is performed on the indices of the gather, scatter, and segment operators. The verification deteriorates the operator execution performance.
  • Create the op_precision.ini configuration file to set operator precision modes. Set a precision mode by operator type (low priority) or by node name (high priority) in each line of the file.

    A configuration example is as follows.

    [ByOpType]
    optype1=high_precision
    optype2=high_performance
    optype3=support_of_bound_index
    
    [ByNodeName]
    nodename1=high_precision
    nodename2=high_performance
    nodename3=support_of_bound_index

ACL_ALLOW_HF32

Not supported in the current version.

Indicates whether to allow the HF32 type to replace the FP32 type during internal computation of operators. true indicates allowed, and false indicates not allowed. In the current version, this configuration takes effect only for Conv and Matmul operators. FP32-to-HF32 conversion is enabled for Conv operators and disabled for Matmul operators by default.

HF32 is a single-precision floating-point type of Ascend for internal computation of operators. The following figure shows the comparison of HF32 with other common data types. HF32 shares the same value range with FP32, but its mantissa precision (11 bits) is close to FP16 (10 bits). Replacing the original FP32 single-precision data type with the HF32 single-precision data type by precision reduction can greatly reduce the space occupied by data and achieve performance improvement.

NOTE:
  • When this parameter is set to true, you can view the operators that allow the HF32 type to replace the FP32 type during internal computation in the opp/built-in/op_impl/ai_core/tbe/impl_mode/allow_hf32_matmul_t_conv_t.ini file under the CANN software installation path.
  • When this parameter is set to false, you can view the operators that do not allow the HF32 type to replace the FP32 type during internal computation in the opp/built-in/op_impl/ai_core/tbe/impl_mode/allow_hf32_matmul_f_conv_f.ini file under the CANN software installation path.

ACL_PRECISION_MODE_V2

Operator precision mode of a network model. If this compilation option is not configured, fp16 is used by default.

Compared with the ACL_PRECISION_MODE option, the ACL_PRECISION_MODE_V2 option is added in the new version, and semantics of the precision mode options are clearer, facilitating understanding.

  • fp16:

    Forces float16 for operators supporting both float16 and float32.

  • origin:

    Retain the original precision.

    • If the precision of an operator in the original graph is float16, and the implementation of the operator in the AI Core does not support float16 but supports only float32 and bfloat16, the system automatically uses high-precision float32.
    • If the precision of an operator in the original graph is float16, and the implementation of the operator in the AI Core does not support float16 but supports only bfloat16, the AI CPU operator of float16 is used. If the AI CPU operator is not supported, an error is reported.
    • If the precision of an operator in the original graph is float32, and the implementation of the operator in the AI Core does not support float32 but supports only float16, the AI CPU operator of float32 is used. If the AI CPU operator is not supported, an error is reported.
  • cube_fp16in_fp32out:
    The system selects a processing mode based on the operator type for operators supporting both float16 and float32.
    • For cube operators, the system processes the computation based on the operator implementation.
      1. The preferred input data type is float16 and the output data type is float32.
      2. If the float16 input data and float32 output data types are not supported, set both the input and output data types to float32.
      3. If the float32 input and output data types are not supported, set both the input and output data types to float16.
      4. If the float16 input and output data types are not supported, an error is reported.
    • For vector operators, float32 is forcibly selected for operators supporting both float16 and float32, even if the original precision is float16.

      This argument is invalid if your model contains operators not supporting float32, for example, an operator that supports only float16. In this case, float16 is retained. If the operator does not support float32 and it is configured to the blocklist of precision reduction (by setting precision_reduce to false), the counterpart AI CPU operator supporting float32 is used. If the AI CPU operator is not supported, an error is reported.

  • mixed_float16:

    Mixed precision of float16 and float32 is used for neural network processing. Computations are done in float16 for float32 operators according to the built-in tuning policies. This will improve system performance and reduce memory footprint with minimal accuracy degradation.

    If this mode is used, you can view the value of the precision_reduce option in the built-in tuning policy file ${INSTALL_DIR}/opp/built-in/op_impl/ai_core/tbe/config/<soc_version>/aic-<soc_version>-ops-info.json in the OPP installation directory.

    • If it is set to true, the operator is on the mixed precision trustlist and its precision will be reduced from float32 to float16.
    • If it is set to false, the operator is on the mixed precision blocklist and its precision will not be reduced from float32 to float16.
    • If an operator does not have the precision_reduce option configured, the operator is on the graylist and will follow the same precision processing as the upstream operator.
  • mixed_hif8: enables automatic mixed precision, indicating that hifloat8 (for details about this data type, see Link), float16, and float32 are used together to process the neural network. In this mode, hifloat8 is automatically used for certain float16 and float32 operators based on the built-in tuning policies. This will improve system performance and reduce memory footprint with minimal precision degradation. The current version does not support this option.

    If this mode is used, you can view the value of precision_reduce in the built-in tuning policy file ${INSTALL_DIR}/opp/built-in/op_impl/ai_core/tbe/config/<soc_version>/aic-<soc_version>-ops-info.json.

    • true: The operator is on the mixed precision trustlist and its precision will be reduced from float16/float32 to hifloat8.
    • false: The operator is on the mixed precision blocklist and its precision will not be reduced from float16/float32 to hifloat8.
    • If an operator does not have the precision_reduce option configured, the operator is on the graylist and will follow the same precision processing as the upstream operator.
  • cube_hif8: The hifloat8 data type is forcibly used if the Cube operator in the network model supports both hifloat8 and float16/float32. The current version does not support this option.

ACL_OP_DEBUG_OPTION

Currently, this parameter can only be set to oom, indicating that detection for out-of-bounds global memory access is enabled during operator compilation.

Before compiling an operator, call aclSetCompileopt to set ACL_OP_DEBUG_OPTION to oom, and call aclrtCtxSetSysParamOpt (with the scope of context) or aclrtSetSysParamOpt (with the scope of process) to set ACL_OPT_ENABLE_DEBUG_KERNEL to 1 to enable detection for out-of-bounds global memory access. During operator execution, if out-of-bounds access occurs when data is read or written (such as reading an operator to input data and writing an operator to output data) from the global memory, AscendCL returns the error code "EZ9999", indicating that an AI Core error occurs.