aclgrphBuildInitialize Configuration Parameters
Basic Functions
|
Parameter |
Description |
|---|---|
|
SOC_VERSION |
Ascend AI Processor used during graph build.
To query <soc_version>:
Configuration example: {ge::ir_option::SOC_VERSION, "<soc_version>"}
Applicability: |
|
ENABLE_SINGLE_STREAM |
Whether to enable single-stream serial execution of model inference in the static shape scenario. Streams preserve the order of a stack of asynchronous operations being executed on the device. Arguments:
Restrictions: If the model contains the Cmo operator and the following control operators, the single-stream feature cannot be used. In this case, use the default value false.
Configuration example: {ge::ir_option::ENABLE_SINGLE_STREAM, "true"}
Applicability: |
|
DETERMINISTIC |
Whether to enable deterministic computing. By default, deterministic computing is disabled. Multiple execution results of an operator with the same hardware and input may be different. This is generally caused by asynchronous multi-thread executions during operator implementation, which changes the accumulation sequence of floating-point numbers. When deterministic computing is enabled, the same output is generated if an operator is executed for multiple times with the same hardware and input. You are advised not to enable deterministic computing because it slows down operator execution and affects performance. If multiple execution results of a model are different or the precision needs to be optimized, you can enable deterministic computing to assist model debugging and optimization. Arguments:
Configuration example: {ge::ir_option::DETERMINISTIC, "1"}
Applicability: |
|
OPTION_HOST_ENV_OS |
If the OS and its architecture of the model build environment are inconsistent with those of the model operating environment, set this parameter to the OS type of the model operating environment. If this parameter is not set, the OS type of the model build environment is used by default. If the OS and its architecture of the model build environment are inconsistent with those of the model operating environment, use this option together with OPTION_HOST_ENV_CPU. OPTION_HOST_ENV_OS is used to set the OS type, and OPTION_HOST_ENV_CPU is used to set the OS architecture. Argument: OS type of the operator .so file packaged in the ${INSTALL_DIR}/opp/built-in/op_graph/lib/ directory. Default value: value in the ${INSTALL_DIR}/opp/scene.info file. Replace ${INSTALL_DIR} with the CANN component directory. For example, if the installation is performed by the root user, the default file storage path is /usr/local/Ascend/cann. Configuration example: {ge::OPTION_HOST_ENV_OS, "linux"}
{ge::OPTION_HOST_ENV_CPU, "x86_64"}
Applicability: |
|
OPTION_HOST_ENV_CPU |
If the OS and its architecture of the model build environment are inconsistent with those of the model operating environment, set this parameter to the OS architecture of the model operating environment. If this parameter is not set, the OS architecture of the model build environment is used by default. If the OS and its architecture of the model build environment are inconsistent with those of the model operating environment, use this option together with OPTION_HOST_ENV_OS. OPTION_HOST_ENV_OS is used to set the OS type, and OPTION_HOST_ENV_CPU is used to set the OS architecture. Argument: OS or CPU type of the operator .so file packaged in the ${INSTALL_DIR}/opp/built-in/op_graph/lib/ directory. Default value: value in the ${INSTALL_DIR}/opp/scene.info file. Replace ${INSTALL_DIR} with the CANN component directory. For example, if the installation is performed by the root user, the default file storage path is /usr/local/Ascend/cann. Configuration example: {ge::OPTION_HOST_ENV_OS, "linux"}
{ge::OPTION_HOST_ENV_CPU, "x86_64"}
Applicability: |
|
VIRTUAL_TYPE |
Whether an offline model can run on a virtual device generated by the Ascend virtual instance feature. If the computing power of a chip is too much for cloud users or small enterprises, the Ascend virtual instance feature can be applied to allocate a proper amount of computing power as needed by the users or small enterprises to suit their services. A virtual device is a virtual acceleration resource allocated by a chip based on specified computing power. Arguments:
Configuration example: {ge::ir_option::VIRTUAL_TYPE, "1"}
Restrictions:
Applicability: |
Memory Management
|
Parameter |
Description |
|---|---|
|
EXEC_DISABLE_REUSED_MEMORY |
Memory reuse switch. Memory reuse refers to the practice of repeatedly utilizing non-conflicting memory based on its lifecycle and size, thereby reducing network memory consumption. Arguments:
Configuration example: {ge::ir_option::EXEC_DISABLE_REUSED_MEMORY, "0"}
Applicability: |
|
EXTERNAL_WEIGHT |
Whether to externalize the weights of the Const/Constant nodes on the original network and convert the node type to FileConstant when the OM model file is generated. In the offline scenario, if the model weight is large and the environment has restrictions on the OM offline model file size, you are advised to enable the external weight and save the weight separately to reduce the OM file size. Arguments:
Configuration example: {ge::ir_option::EXTERNAL_WEIGHT, "1"}
Restrictions:
Applicability: |
Dynamic Shape
|
Parameter |
Description |
|---|---|
|
AC_PARALLEL_ENABLE |
Whether to allow AI CPU operators and AI Core operators to run in parallel in a dynamic shape graph. In a dynamic shape graph, when this function is enabled, the system automatically identifies AI CPU operators that can be run in parallel with the AI Core operators in the graph. Operators of different engines are distributed to different streams to run in parallel, improving resource utilization and dynamic shape execution performance. Arguments:
Configuration example: {ge::ir_option::AC_PARALLEL_ENABLE, "1"}
Applicability: |
Operator and Graph Build
Debugging
Precision Tuning
Precision Comparison
Performance Tuning
|
Parameter |
Description |
|---|---|
|
ENABLE_SMALL_CHANNEL |
Whether to enable small channel tuning to yield performance benefits at convolutional layers with channel size ≤ 4. You are advised to enable this function in inference scenarios. Arguments:
Configuration example: {ge::ir_option::ENABLE_SMALL_CHANNEL, "1"}
Applicability: |
|
OPTYPELIST_FOR_IMPLMODE |
Operator implementation mode in the optype list. Restrictions:
Configuration example: {ge::ir_option::OPTYPELIST_FOR_IMPLMODE, "Pooling,SoftmaxV2"}
Applicability: |
|
TILING_SCHEDULE_OPTIMIZE |
Whether to enable the optimization for tiling offload scheduling. As internal storage of the AI Cores in the NPU cannot store all the input and output data of operators, the input data is tiled into different parts. The first part is transferred in, computed, and then transferred out, so does the next part. This process is called tiling. Then, a computation program, called tiling implementation, determines tiling parameters (such as the block size transferred each time and the total number of cycles) based on operator information such as shape. The AI Core is not good at scalar computation in the tiling implementation. Therefore, tiling implementation is generally executed on the CPU on the host. However, tiling implementation is executed on the device when the following conditions are met:
Arguments:
Configuration example: {ge::ir_option::TILING_SCHEDULE_OPTIMIZE, "1"}
Applicability: |
Quantization and Compression
|
Parameter |
Description |
|---|---|
|
ENABLE_COMPRESS_WEIGHT |
Whether to enable global weight compression. AI Core supports weight compression. If the function controlled by this parameter is enabled, the weight data can be compressed. During operator computation, the weight will be extracted to reduce the bandwidth load and improve the performance. This parameter enables global weight compression. This parameter is mutually exclusive with COMPRESS_WEIGHT_CONF. Arguments:
Configuration example: {ge::ir_option::ENABLE_COMPRESS_WEIGHT, "true"}
Applicability: |
|
COMPRESS_WEIGHT_CONF |
Path and name of the configuration file of the nodes to be compressed. The nodes mainly include the conv and fc operators. This parameter is mutually exclusive with ENABLE_COMPRESS_WEIGHT. Format: The path including the file name allows only letters, digits, and underscores (_). The file name can contain letters, digits, underscores (_), and periods (.). Restrictions: The weight compression configuration file is generated by AMCT. It is a list of node names separated with semicolons (;). For example, the content of the compress_weight_nodes.cfg file is conv1; fc1; conv2_2/x1; fc2; conv5_32/x2;fc6. Configuration example: {ge::ir_option::COMPRESS_WEIGHT_CONF, "$HOME/module/compress_weight_nodes.cfg"}
Applicability: |
|
SPARSITY |
Whether to enable global sparsity. In the model output by AMCT (Ascend Model Compression Toolkit) after 2:4 structured sparsity, there may be the cases that at least two weight elements in the Cin dimension out of four contiguous ones are forced to zero. You can enable global sparsity during model conversion to filter out two elements to reduce computational demand for inference and optimize inference performance. Due to hardware restrictions, this parameter cannot be used together with ENABLE_COMPRESS_WEIGHT or COMPRESS_WEIGHT_CONF. Arguments:
Configuration example: {ge::ir_option::SPARSITY, "1"}
Restrictions: When using this parameter, ensure that a sparse model is used. You are advised to use the compression combination function of AMCT (TensorFlow) or AMCT (PyTorch). The compression combination requires 2:4 structured sparsity and quantization aware training. Applicability: |
|
COMPRESSION_OPTIMIZE_CONF |
Path (including the name) of the compression optimization configuration file. This parameter is used to enable the compression optimization function specified in the configuration file to improve network performance. For example, /home/test/compression_optimize.cfg. An example of the file content configuration is as follows. enable_first_layer_quantization:true
Applicability: |
Experiment Parameters
|
Parameter |
Description |
|---|---|
|
ALLOW_HF32 |
This parameter is reserved and is not supported in the current version. Whether to enable the function of automatically replacing the float32 data type with the HF32 data type. In the current version, this option takes effect only for Conv and Matmul operators. HF32 is a single-precision floating-point type developed by Ascend for internal computation of operators. The following shows the comparison with other common data types. HF32 shares the value range with float32, but its mantissa precision (11 bits) is close to FP16 (10 bits). Replacing the original float32 single-precision data type with the HF32 single-precision data type by precision reduction can greatly reduce the space occupied by data and improve performance. Arguments:
Default: Enable FP32-to-HF32 conversion for Conv operators; disable FP32-to-HF32 conversion for Matmul operators. Restrictions:
Configuration example: {ge::ir_option::ALLOW_HF32, "true"}
Applicability: |
|
OO_LEVEL |
Extended option for debugging. It cannot be used in commercial products and will be released as a formal function in later versions. Multi-level optimization options for graph build include subgraph optimization, entire graph optimization, and static shape model offloading. Static shape model offloading: In this approach, the input and output shapes of all operators in a static shape model can be determined at build time, allowing for model-level memory orchestration and operator tiling computation to be completed on the host. These computations are then batched and sent to the device stream when the model is loaded, but they are not executed immediately. Instead, the execution of all tasks within the model is triggered by the delivery of model execution tasks. Arguments:
Restrictions: If the value is O1, all graph fusion and UB fusion passes are disabled, and only passes related to static offloading are enabled. However, the graph fusion passes in the following files are enabled by default because function problems may occur if they are disabled: All graph fusion passes under the ExceptionalPassOfO1Level field in the ${INSTALL_DIR}/<arch>-linux/lib64/plugin/opskernel/fusion_pass/config/fusion_config.json file Replace ${INSTALL_DIR} with the CANN component directory. For example, if the installation is performed by the root user, the default file storage path is /usr/local/Ascend/cann.<arch> indicates the OS architecture. Configuration example: {ge::ir_option::OO_LEVEL, "O3"}
Applicability: |
|
OO_CONSTANT_FOLDING |
Extended option for debugging. It cannot be used in commercial products and will be released as a formal function in later versions. Whether to enable constant folding optimization. Constant folding is the process of replacing nodes that can be evaluated to a constant output value in a computational graph with that constant, and simplifying the structure of the computational graph accordingly. Arguments:
Configuration example: {ge::ir_option::OO_CONSTANT_FOLDING, "true"}
Applicability: |
|
OO_DEAD_CODE_ELIMINATION |
Extended option for debugging. It cannot be used in commercial products and will be released as a formal function in later versions. Whether to enable dead-edge elimination optimization. Dead-edge elimination (switch dead-edge elimination): When pred (input 1) of a switch statement is a constant node, one of the branches can be eliminated based on the value of const. If const is true, the false branch is eliminated; if const is false, the true branch is eliminated. Arguments:
Configuration example: {ge::ir_option::OO_DEAD_CODE_ELIMINATION, "true"}
Applicability: |
|
TUNE_DEVICE_IDS |
Not supported in the current version. |
Parameters That Will Be Deprecated in Later Versions
|
Parameter |
Description |
|---|---|
|
OP_SELECT_IMPL_MODE |
Operator implementation mode. Certain operators built in the Ascend AI Processor can be implemented in either high-precision or high-performance mode at model build time. In high-precision mode, Taylor's theorem or Newton's method is used to improve operator precision with float16 input. In high-performance mode, the optimal performance is implemented without affecting the network precision (float16). Arguments:
The preceding implementation modes are distinguished based on dtype of the operator. Replace ${INSTALL_DIR} with the CANN component directory. For example, if the installation is performed by the root user, the default file storage path is /usr/local/Ascend/cann. Default: high_performance Configuration example: {ge::ir_option::OP_SELECT_IMPL_MODE, "high_performance"}
Applicability: |


