aclCompileOpt

typedef enum {
    ACL_PRECISION_MODE,           // Sets the operator precision mode of a network model.
    ACL_AICORE_NUM,               // Sets the number of AI Cores used for model compilation.
    ACL_AUTO_TUNE_MODE,          // Sets the operator auto tuning mode.
    ACL_OP_SELECT_IMPL_MODE,      // Sets the operator implementation mode.
    ACL_OPTYPELIST_FOR_IMPLMODE,  // Lists operator types. Operators in the list are implemented in the mode specified by ACL_OP_SELECT_IMPL_MODE.
    ACL_OP_DEBUG_LEVEL,           // Enables or disables TBE operator debug during operator compilation.
    ACL_DEBUG_DIR,                // Sets the debug directory, for saving the files generated during model conversion and network migration, including the .o, .json, and .cce files of operators.
    ACL_OP_COMPILER_CACHE_MODE,   // Sets the disk cache mode for operator compilation.
    ACL_OP_COMPILER_CACHE_DIR,    // Sets the disk cache directory for operator compilation.
    ACL_OP_PERFORMANCE_MODE,      // Determines whether to compile operators in high-performance mode.
    ACL_OP_JIT_COMPILE,           // Determines whether to compile an operator online or use the binary file of a compiled operator.
    ACL_OP_DETERMINISTIC,         // Determines whether to enable deterministic computing.
    ACL_CUSTOMIZE_DTYPES,          // Customizes the computation precision of one or more operators during model compilation.
    ACL_OP_PRECISION_MODE,        // Sets the precision mode for internal operator processing. One or more operators can be specified.
    ACL_ALLOW_HF32,               // HF32 is a precision type of the Ascend AI Processor for the internal computation of operators. It is not supported in the current version.
    ACL_PRECISION_MODE_V2,        // Sets the operator precision mode of a network model. Compared with ACL_PRECISION_MODE, ACL_PRECISION_MODE_V2 is added in the new version, offering more precision modes and clearer semantics for the existing precision mode options.
    ACL_OP_DEBUG_OPTION           // Currently, this option can only be set to oom, indicating that global memory out-of-bounds access detection is enabled.
} aclCompileOpt;
Table 1 Compilation options

Compilation Option

Description

ACL_PRECISION_MODE

Sets the operator precision mode of a network model. If it is not specified, allow_fp32_to_fp16 is used by default.

  • force_fp32/cube_fp16in_fp32out:
    force_fp32 and cube_fp16in_fp32out have the same effect. This option indicates that the system selects different processing modes based on the operator type when the operator in the AI Core supports both the float32 and float16 data types. cube_fp16in_fp32out is newly added to the new version. For cube operators, this option has clearer semantics.
    • For cube operators, the system processes the computation based on the operator implementation.
      1. The preferred input data type is float16 and the output data type is float32.
      2. If the float16 input data and float32 output data types are not supported, set both the input and output data types to float32.
      3. If the float32 input and output data types are not supported, set both the input and output data types to float16.
      4. If the float16 input and output data types are not supported, an error is reported.
    • For vector compute operators, the operator precision in the original graph is float16 or bfloat16, and float32 is forcibly selected.

      This option is invalid if the original graph contains operators not supporting float32 in the AI Core, for example, an operator that supports only float16. In this case, float16 is retained. If the operator in the AI Core does not support float32 and it is configured to the blocklist of precision reduction (by setting precision_reduce to false), the counterpart AI CPU operator supporting float32 is used. If the AI CPU operator does not support float32, an error is reported.

  • force_fp16 (default):

    Indicates that float16 is forcibly selected if the operator precision in the original graph is float16, bfloat16, and float32.

  • allow_fp32_to_fp16:
    • For matrix operators:
      • If the operator precision in the original graph is float32, the precision is preferably reduced to float16. If the operator in the AI Core does not support float16, float32 is used. If the operator in the AI Core does not support float32, the AI CPU operator is used for computation. If the AI CPU operator also does not support float32, an error is reported during execution.
      • If the operator precision in the original graph is bfloat16, the precision of the original graph is preferably used. If the operator in the AI Core does not support bfloat16, float32 is used. If the operator in the AI Core does not support float32, the precision is directly reduced to float16. If the operator in the AI Core does not support float16, the AI CPU operator is used for computation. If the AI CPU operator also does not support float16, an error is reported during execution.
    • For vector operators, the precision of the original graph is retained preferably.
      • If the operator precision in the original graph is float32, the precision of the original graph is preferably used. If the operator in the AI Core does not support float32, the precision is directly reduced to float16. If the operator in the AI Core does not support float16, the AI CPU operator is used for computation. If the AI CPU operator also does not support float16, an error is reported during execution.
      • If the operator precision in the original graph is bfloat16, the precision of the original graph is preferably used. If the operator in the AI Core does not support bfloat16, float32 is used. If the operator in the AI Core does not support float32, the precision is directly reduced to float16. If the operator in the AI Core does not support float16, the AI CPU operator is used for computation. If the AI CPU operator also does not support float16, an error is reported during execution.
  • must_keep_origin_dtype:

    Retain the original precision.

    • If the precision of an operator in the original graph is float16, and the implementation of the operator in the AI Core does not support float16 but supports only float32 and bfloat16, the system automatically uses high-precision float32.
    • If the precision of an operator in the original graph is float16, and the implementation of the operator in the AI Core does not support float16 but supports only bfloat16, the AI CPU operator of float16 is used. If the AI CPU operator is not supported, an error is reported.
    • If the precision of an operator in the original graph is float32, and the implementation of the operator in the AI Core does not support float32 but supports only float16, the AI CPU operator of float32 is used. If the AI CPU operator is not supported, an error is reported.
  • allow_mix_precision/allow_mix_precision_fp16:

    allow_mix_precision has the same effect as that of allow_mix_precision_fp16, indicating that mixed precision of float16, bfloat16, and float32 is used for neural network processing. allow_mix_precision_fp16 is newly added to the new version, which has clearer semantics for easy understanding.

    For float32 and befloat16 operators in the original model, float16 is automatically used for certain float32 and bfloat16 operators based on the built-in tuning policy. This will improve system performance and reduce memory usage with minimal precision degradation.

    If this mode is configured, you can view the value of precision_reduce in the built-in tuning policy file of ${INSTALL_DIR}/opp/built-in/op_impl/ai_core/tbe/config/xxx/aic-xxx-ops-info-*.json.

    • If it is set to true, the operator is on the mixed precision trustlist and its precision will be reduced from float32 and bfloat16 to float16.
    • If it is set to false, the operator is on the mixed precision blocklist and its precision will not be reduced from float32 and bfloat16 to float16. In this case, the operator still uses the precision of float32 or bfloat16.
    • If an operator in the network model does not have the precision_reduce option configured, the operator is on the graylist and will follow the same precision processing as the upstream operator.
  • allow_mix_precision_bf16:

    Mixed precision of bfloat16 and float32 is used for neural network processing. In this mode, bfloat16 is automatically used for certain float32 operators on the original model based on the built-in tuning policy. This will improve system performance and reduce memory usage with minimal precision degradation. If the operator in the AI Core does not support bfloat16 and float32, the AI CPU operator is used for computation. If AI CPU operator also does not support bfloat16 and float32, an error is reported during execution.

    If this mode is configured, you can view the value of precision_reduce in the built-in tuning policy file of ${INSTALL_DIR}/opp/built-in/op_impl/ai_core/tbe/config/xxx/aic-xxx-ops-info-*.json.

    • If it is set to true, the operator is on the mixed precision trustlist and its precision will be reduced from float32 to bfloat16.
    • If the field value is false, the operator is on the mixed precision blocklist and its precision will not be reduced from float32 to bfloat16.
    • If an operator in the network model does not have the precision_reduce option configured, the operator is on the graylist and will follow the same precision processing as the upstream operator.
  • allow_fp32_to_bf16:
    • If the operator precision in the original graph is float32, the precision of the original graph is preferably used. If the operator in the AI Core does not support float32, the precision is reduced to bfloat16. If the operator in the AI Core does not support bfloat16, the AI CPU operator is used for computation. If the AI CPU operator also does not support bfloat16, an error is reported during execution.
    • If the operator precision in the original graph is bfloat16, the precision of the original graph is preferably used. If the operator in the AI Core does not support bfloat16, float32 is used. If the operator in the AI Core does not support float32, the AI CPU operator is used for computation. If the AI CPU operator also does not support float32, an error is reported during execution.

ACL_AICORE_NUM

Sets the number of AI Cores used for model compilation.

The setting is invalid in the current version.

ACL_AUTO_TUNE_MODE

Do not set this parameter because it will be deprecated. Otherwise, compatibility issues may occur in later versions. If tuning is involved, see AOE Instructions.

Sets the operator auto tuning mode.

  • GA: genetic algorithm, for tuning Cube operators.
  • RL: reinforcement learning, for tuning Vector operators.

ACL_OP_SELECT_IMPL_MODE

Sets the operator implementation mode. If it is not specified, high_precision is used by default.

  • high_precision: high-precision mode.

    This option sets the operator implementation mode by using the built-in configuration file, which is stored in ${INSTALL_DIR}/opp/op_impl/built-in/ai_core/tbe/impl_mode/high_precision.ini.

    To ensure compatibility, this argument takes effect only for the operator list in the high_precision.ini file. This list can be used to control the effective scope of operators and ensure that the network models of earlier versions are not affected.

  • high_performance (default): high-performance mode.

    This option sets the operator implementation mode by using the built-in configuration file, which is stored in ${INSTALL_DIR}/opp/built-in/op_impl/ai_core/tbe/impl_mode/high_performance.ini.

    To ensure compatibility, this argument takes effect only for the operator list in the high_performance.ini file. This list can be used to control the effective scope of operators and ensure that the network models of earlier versions are not affected.

  • high_precision_for_all: high-precision mode.

    This option sets the operator implementation mode by using the built-in configuration file, which is stored in ${INSTALL_DIR}/opp/built-in/op_impl/ai_core/tbe/impl_mode/high_precision_for_all.ini. The list in this file may be updated with the version.

    This implementation mode may cause incompatibility. If an operator in the new software package sets the implementation mode (that is, an implementation mode is added for a certain operator in the configuration file), the performance of the earlier network model that uses the high_precision_for_all mode may deteriorate.

  • high_performance_for_all: high-performance mode.

    This option sets the operator implementation mode by using the built-in configuration file, which is stored in ${INSTALL_DIR}/opp/built-in/op_impl/ai_core/tbe/impl_mode/high_performance_for_all.ini. The list in this file may be updated with the version.

    This implementation mode may cause incompatibility. If an operator in the new software package sets the implementation mode (that is, an implementation mode is added for a certain operator in the configuration file), the precision of the earlier network model that uses the high_performance_for_all mode may deteriorate.

ACL_OPTYPELIST_FOR_IMPLMODE

Sets the list of operator types (multiple operators are separated by commas). This option is used together with ACL_OP_SELECT_IMPL_MODE to specify whether the operators in the list use the high-precision or high-performance implementation mode.

ACL_OP_DEBUG_LEVEL

Enables or disables TBE operator debug during operator compilation.

The options are as follows:

  • 0 (default): Disables operator debug. The operator build folder kernel_meta is not generated in the current execution path.
  • 1: Enables operator debug. The kernel_meta folder is generated in the current execution path, and the .o file (operator binary file), .json file (operator description file), and TBE instruction mapping files (operator file *.cce and python-CCE mapping file *_loc.json) are generated in the folder for later analysis of AI Core errors.
  • 2: Enables operator debug. The kernel_meta folder is generated in the current execution path, and the .o file (operator binary file), .json file (operator description file), and TBE instruction mapping files (operator file *.cce and python-CCE mapping file *_loc.json) are generated in the folder for later analysis of AI Core errors. Setting this option to 2 also disables build optimization and enables the CCE compiler debug function (the CCE compiler option is set to -O0-g).
  • 3: Disables operator debug. The kernel_meta folder is generated in the current execution path, and the .o file (operator binary file) and .json file (operator description file) are generated in the folder. You can refer to these files when analyzing operator errors.
  • 4: Disables operator debug. The kernel_meta folder is generated in the current execution path, and the .o file (operator binary file), .json file (operator description file), TBE instruction mapping file (operator file *.cce), and UB fusion description file ({$kernel_name}_compute.json) are generated in the folder. These files can be used for problem reproduction and accuracy comparison during operator error analysis.

The configuration constraints are as follows:

  • If this option is set to 2, the CCE compiler is enabled, and the size of the operator kernel file (*.o file) increases. In the dynamic shape scenario, all possible shape scenarios are traversed during operator build, which may cause operator build failures due to large operator kernel files. In this case, you are advised not to enable the CCE compiler options.

    If a build failure is caused by the large operator kernel file, the following log is displayed:

    message:link error ld.lld: error: InputSection too large for range extension thunk ./kernel_meta_xxxxx.o
  • When the debug function is enabled, if the model contains the following merged compute and communication (MC2) operators, the *.o, *.json, and *.cce files of the operators are not generated in the operator build folder kernel_meta.

    MatMulAllReduce

    MatMulAllReduceAddRmsNorm

    AllGatherMatMul

    MatMulReduceScatter

    AlltoAllAllGatherBatchMatMul

    BatchMatMulReduceScatterAlltoAll

ACL_DEBUG_DIR

Sets the path (defaults to Current path of the executed application/kernel_meta) for storing debugging information files generated after operator compilation during model conversion and network migration, including the .o, .json, and .cce files. The generated files depend on the value of ACL_OP_DEBUG_LEVEL.

The directory name can contain letters, digits, underscores (_), hyphens (-), and periods (.).

In addition to setting enumerated values, you can also configure the cache directory for operator compilation files by setting the environment variable ASCEND_WORK_PATH. The priorities of these methods are as follows: Setting enumerated values > Setting environment variables > Default directory. For details about how to set environment variables, see Environment Variables.

ACL_OP_COMPILER_CACHE_MODE

Sets the disk cache mode for operator compilation. This option must be used in conjunction with ACL_OP_COMPILER_CACHE_DIR.

  • enable: enabled. If it is enabled, operators with the same build configurations and operator configurations will not be built repeatedly, thus accelerating the build speed.
  • force: Enabled with cache forcibly refreshed. That is, the existing cache is cleared up before the operator is recompiled and added to the cache. For example, for Python changes, dependency library changes, or repository changes after operator optimization, you need to set this option to force to clear up the existing cache and then change it to enable to prevent the cache from being forcibly refreshed during each build.
  • disable (default): disabled. The operator is rebuilt.

If debugging is also enabled (ACL_OP_DEBUG_LEVEL is set to a non-zero value), the system ignores the configuration of ACL_OP_COMPILER_CACHE_MODE and does not cache the compilation result in the debugging scenario.

When you enable the operator compilation cache function, you can set the disk space of the cache folder with the configuration file (the op_cache.ini configuration file is automatically generated in the path specified by ACL_OP_COMPILER_CACHE_DIR during operator compilation) or environment variables.

  1. Using the op_cache.ini configuration file:

    If the op_cache.ini file does not exist, manually create it. Open the file and add the following information:

    # Configure the file format (required). The automatically generated file contains the following information by default. When manually creating a file, enter the following information:
    [op_compiler_cache]
    # Limit the disk space of the cache folder on a chip, in MB. The default value is 500. The value must be an integer.
    max_op_cache_size=500
    # Set the ratio of the cache size to be reserved, in percentage. The value range is [1, 100]. The default value is 50. For example, 80 indicates that when the cache space is insufficient, 80% of the cache space is reserved and the rest is cleared up.
    remain_cache_size_ratio=50    
    • The op_cache.ini file takes effect only when the values of max_op_cache_size and remain_cache_size_ratio in the preceding file are valid.
    • If the size of the build cache file exceeds the value of max_op_cache_size and the cache file is not accessed for more than half an hour, the cache file will be aged. (Operator build will not be interrupted due to the size of the build cache file exceeding the set limit. Therefore, if max_op_cache_size is set to a small value, the size of the actual build cache file may exceed the configured value.)
    • To disable the build cache aging function, set max_op_cache_size to -1. In this case, the access time is not updated when the operator cache is accessed, the operator build cache is not aged, and the default disk space of 500 MB is used.
    • If multiple users use the same cache path, you are advised to use the configuration file to set the cache path. In this scenario, the op_cache.ini file affects all users.
  2. Using environment variables

    In this scenario, the environment variable ASCEND_MAX_OP_CACHE_SIZE is used to limit the storage space of the cache folder of a chip. When the build cache space reaches the specified value and the cache file is not accessed for more than half an hour, the cache file is aged. The environment variable ASCEND_REMAIN_CACHE_SIZE_RATIO is used to set the ratio of the cache space to be reserved.

    A configuration example is as follows:

    # The ASCEND_MAX_OP_CACHE_SIZE environment variable defaults to 500, in MB. The value must be an integer.
    export ASCEND_MAX_OP_CACHE_SIZE=500
    # The value range of the ASCEND_REMAIN_CACHE_SIZE_RATIO environment variable is [1, 100]. The default value is 50, in percentage. For example, 80 indicates that when the cache space is insufficient, 80% of the cache space is reserved and the rest is cleared up.
    export ASCEND_REMAIN_CACHE_SIZE_RATIO=50
    • The argument configured through environment variables takes effect only for the current user.
    • To disable the build cache aging function, set the environment variable ASCEND_MAX_OP_CACHE_SIZE to -1. In this case, the access time is not updated when the operator cache is accessed, the operator build cache is not aged, and the default disk space of 500 MB is used.

If both the op_cache.ini file and environment variable are configured, the configuration items in the op_cache.ini file are read first. If neither the op_cache.ini file nor the environment variable is configured, the system default values are read: 500 MB disk space and 50% reserved cache space.

ACL_OP_COMPILER_CACHE_DIR

Sets the cache directory of operator compilation files. The default directory is $HOME/atc_data. This option must be used in conjunction with ACL_OP_COMPILER_CACHE_MODE.

The directory name can contain letters, digits, underscores (_), hyphens (-), and periods (.).

If the ACL_OP_DEBUG_LEVEL option is set, the compilation cache function is enabled only when this option is set to 0.

In addition to setting enumerated values, you can also configure the cache directory for operator compilation files by setting the environment variable ASCEND_CACHE_PATH. The priorities of these methods are as follows: Setting enumerated values > Setting environment variables > Default directory. For details about how to set environment variables, see Environment Variables.

ACL_OP_PERFORMANCE_MODE

Do not set this parameter because it has been deprecated. Otherwise, compatibility issues may occur in later versions.

Sets the performance mode (high-performance) for operator compilation. The default value is normal.

The options are as follows:

  • normal: The operator is compiled with the optimal compilation performance.
  • high: The operator is compiled with the highest runtime performance using the generalization strategy.

ACL_OP_JIT_COMPILE

Determines whether to compile an operator online or use the binary file of a compiled operator.

  • enable: Operators are compiled online. The system performs tuning based on the obtained operator information to generate operators with improved runtime performance. In the static-shape network scenario, you are advised to set this parameter to enable.
  • disable: The compiled operator binary file in the system is preferentially searched. If the file can be found, operators are not compiled anymore, which produces better compilation performance. If the file cannot be found, operators will be compiled. In the dynamic-shape network scenario, you are advised to set this parameter to disable. If this parameter is set to disable, you need to install the operator binary file package. For details, see section ""Installing CANN"" in CANN Software Installation Guide.

For the Atlas training products , the default value is enable.

For the Atlas inference products , the default value is enable.

For the Atlas 200I/500 A2 inference products , the default value is enable.

For the Atlas A2 training products / Atlas A2 inference products , the default value is disable.

For the Atlas A3 training products / Atlas A3 inference products , the default value is disable.

ACL_OP_DETERMINISTIC

Whether to enable deterministic computing.

  • 0 (default): disables deterministic computing. In this case, the results of multiple executions of an operator with the same hardware and input may be different. This is generally caused by asynchronous multi-thread executions during operator implementation, which changes the accumulation sequence of floating-point numbers.
  • 1: enables deterministic computing. In this case, the results of multiple executions of an operator with the same hardware and input will be the same. However, enabling deterministic computing often slows down operator execution.

You are advised not to enable deterministic computing because it slows down operator execution and affects performance. If the execution results of a model are different for multiple times or the precision needs to be optimized, you can enable deterministic computing to assist model debugging and optimization.

ACL_CUSTOMIZE_DTYPES

Sets the path (including the file name) of the *.cfg configuration file, which lists the names or types of operators whose calculation precisions need to be specified. Each operator occupies a line. With this configuration, you can customize the computation precision of one or more operators during model compilation.

Configuration constraints:

  • The path and file name can contain letters, digits, underscores (_), hyphens (-), periods (.), and colons (:).
  • To specify operator names in the configuration file, follow the Opname::InputDtype:dtype1,...,OutputDtype:dtype1,... format. Put each operator name in a separate line. The data types (such as dtype1 and dtype2) must correspond one-to-one to the inputs and outputs of the operators whose precision can be configured.
  • To specify operator types in the configuration file, follow the OpType::TypeName:InputDtype:dtype1,...,OutputDtype:dtype1,... format. Put each operator type in a separate line. The data types (such as dtype1 and dtype2) must correspond one-to-one to the inputs and outputs of the operators whose precision can be configured. The operator type must be OpType of the Ascend IR–defined operator. For details about OpType, see Operator Library API Reference.
  • For the same operator, if both Opname and OpType are configured, the Opname configuration is used during compilation.
  • The computation precision of an operator specified by this option does not take effect if the operator is fused during model conversion.

ACL_OP_PRECISION_MODE

Sets the path and file name of the configuration file (.ini format) for setting the operator precision mode. The path and file name can contain letters, digits, underscores (_), hyphens (-), and periods (.).

  • The following precision modes can be set in the configuration file:
    • high_precision
    • high_performance
    • enable_float_32_execution: uses the FP32 type for internal computation of operators. When HF32 is used for computation and the accuracy drop exceeds your expectation, you can enable this configuration to use FP32 for internal computation of certain operators to maintain accuracy.

      The Atlas A2 training products / Atlas A2 inference products support this configuration.

      The Atlas A3 training products / Atlas A3 inference products support this configuration.

    • enable_hi_float_32_execution: uses the HF32 type for internal computation of operators. If this configuration is enabled, the FP32 type is automatically converted to the HF32 type. This configuration can reduce the space occupied by data and improve performance. It is not supported in the current version.
    • support_out_of_bound_index indicates that the out-of-bounds verification is performed on the indices of the gather, scatter, and segment operators. The verification deteriorates the operator execution performance.
  • Construct the op_precision.ini configuration file to set operator precision modes. Specify one precision mode per line, either by operator type (lower priority) or by node name (higher priority).

    A configuration example is as follows:

    [ByOpType]
    optype1=high_precision
    optype2=high_performance
    optype3=support_of_bound_index
    
    [ByNodeName]
    nodename1=high_precision
    nodename2=high_performance
    nodename3=support_of_bound_index

ACL_ALLOW_HF32

The current version does not support this option.

Indicates whether to allow the HF32 type to replace the FP32 type during internal computation of operators. true indicates allowed, and false indicates not allowed. In the current version, this configuration takes effect only for Conv and Matmul operators. FP32-to-HF32 conversion is enabled for Conv operators and disabled for Matmul operators by default.

HF32 is a single-precision floating-point type of the Ascend AI Processor for internal computation of operators. The following figure shows the comparison of HF32 with other common data types. HF32 shares the same value range with FP32, but its mantissa precision (11 bits) is close to that of FP16 (10 bits). Replacing the original FP32 single-precision data type with the HF32 single-precision data type by reducing precision can greatly reduce the space occupied by data and improve performance.

ACL_PRECISION_MODE_V2

Sets the operator precision mode of a network model. If this compilation option is not configured, fp16 is used by default.

Compared with the ACL_PRECISION_MODE option, the ACL_PRECISION_MODE_V2 option is added in the new version, and semantics of the precision mode options are clearer, making them easier to understand.

  • fp16 (default):

    Indicates that float16 is forcibly selected if the operator precision in the original graph is float16, bfloat16, or float32.

  • origin:

    Retain the original precision.

    • If the precision of an operator in the original graph is float16, and the implementation of the operator in the AI Core does not support float16 but supports only float32 and bfloat16, the system automatically uses high-precision float32.
    • If the precision of an operator in the original graph is float16, and the implementation of the operator in the AI Core does not support float16 but supports only bfloat16, the AI CPU operator of float16 is used. If the AI CPU operator is not supported, an error is reported.
    • If the precision of an operator in the original graph is float32, and the implementation of the operator in the AI Core does not support float32 but supports only float16, the AI CPU operator of float32 is used. If the AI CPU operator is not supported, an error is reported.
  • cube_fp16in_fp32out:
    The system selects a processing mode based on the operator type for AI Core operators supporting both float32 and float16.
    • For cube operators, the system processes the computation based on the operator implementation.
      1. The preferred input data type is float16 and the output data type is float32.
      2. If the float16 input data and float32 output data types are not supported, set both the input and output data types to float32.
      3. If the float32 input and output data types are not supported, set both the input and output data types to float16.
      4. If the float16 input and output data types are not supported, an error is reported.
    • For vector compute operators, the operator precision in the original graph is float16 or bfloat16, and float32 is forcibly selected.

      This option is invalid if the original graph contains operators not supporting float32 in the AI Core, for example, an operator that supports only float16. In this case, float16 is retained. If the operator in the AI Core does not support float32 and it is configured to the blocklist of precision reduction (by setting precision_reduce to false), the counterpart AI CPU operator supporting float32 is used. If the AI CPU operator does not support float32, an error is reported.

  • mixed_float16:

    Mixed precision of float16, bfloat16, and float32 is used for neural network processing. For float32 and befloat16 operators in the original graph, float16 is automatically used for certain float32 and bfloat16 operators based on the built-in tuning policy. This will improve system performance and reduce memory usage with minimal precision degradation.

    If this mode is configured, you can view the value of precision_reduce in the built-in tuning policy file of ${INSTALL_DIR}/opp/built-in/op_impl/ai_core/tbe/config/xxx/aic-xxx-ops-info-*.json.

    • If it is set to true, the operator is on the mixed precision trustlist and its precision will be reduced from float32 and bfloat16 to float16.
    • If it is set to false, the operator is on the mixed precision blocklist and its precision will not be reduced from float32 and bfloat16 to float16. In this case, the operator still uses the precision of float32 or bfloat16.
    • If an operator in the network model does not have the precision_reduce option configured, the operator is on the graylist and will follow the same precision processing as the upstream operator.
  • mixed_bfloat16:

    Mixed precision of bfloat16 and float32 is used for neural network processing. In this mode, bfloat16 is automatically used for certain float32 operators in the original graph based on the built-in tuning policy. This will improve system performance and reduce memory usage with minimal precision degradation. If the operators do not support bfloat16 and float32, the AI CPU operators are used for computation. If AI CPU operators also do not support float16 and float32, an error is reported during execution.

    If this mode is configured, you can view the value of precision_reduce in the built-in tuning policy file of ${INSTALL_DIR}/opp/built-in/op_impl/ai_core/tbe/config/xxx/aic-xxx-ops-info-*.json.

    • If it is set to true, the operator is on the mixed precision trustlist and its precision will be reduced from float32 to bfloat16.
    • If the field value is false, the operator is on the mixed precision blocklist and its precision will not be reduced from float32 to bfloat16.
    • If an operator in the network model does not have the precision_reduce option configured, the operator is on the graylist and will follow the same precision processing as the upstream operator.
  • mixed_hif8:

    Enables automatic mixed precision, indicating that hifloat8 (for details about this data type, see Link), float16, bfloat16, and float32 are used together for neural network processing. In this mode, hifloat8 is automatically used for certain float16, bfloat16, and float32 operators in the original graph based on the built-in tuning policy. This will improve system performance and reduce memory usage with minimal precision degradation. The current version does not support this argument.

    If this mode is configured, you can view the value of precision_reduce in the built-in tuning policy file of ${INSTALL_DIR}/opp/built-in/op_impl/ai_core/tbe/config/xxx/aic-xxx-ops-info-*.json.

    • If it is set to true, the operator is on the mixed precision trustlist and its precision will be reduced from float16, bfloat16, and float32 to hifloat8.
    • If it is set to false, the operator is on the mixed precision blocklist and its precision will not be reduced from float16, bfloat16, and float32 to hifloat8. In this case, the operator still uses the precision of float16, bfloat16, or float32.
    • If an operator in the original graph does not have the precision_reduce option configured, the operator is on the graylist and will follow the same precision processing as the upstream operator.
  • cube_hif8:

    The hifloat8 data type is forcibly used if the Cube operator in the original graph supports both hifloat8 and float16, bfloat16, or float32. The current version does not support this argument.

ACL_OP_DEBUG_OPTION

Currently, this option can only be set to oom, indicating that detection for out-of-bounds global memory access is enabled during operator compilation.

Before compiling an operator, call to set ACL_OP_DEBUG_OPTION to oom, and call (with the scope of context) or (with the scope of process) to set ACL_OPT_ENABLE_DEBUG_KERNEL to 1 to enable detection for out-of-bounds global memory access. During operator execution, if out-of-bounds access occurs when data is read from or written to the global memory (for example, reading the operator input data or writing the operator output data), the error code "EZ9999" is returned, indicating that the operator has an AI Core error.