aclgrphBuildModel Configuration Parameters
Basic Functions
Memory Management
|
Parameter |
Description |
|---|---|
|
EXEC_DISABLE_REUSED_MEMORY |
Memory reuse switch. Memory reuse refers to the practice of repeatedly utilizing non-conflicting memory based on its lifecycle and size, thereby reducing network memory consumption. Arguments:
Configuration example: {ge::ir_option::EXEC_DISABLE_REUSED_MEMORY, "0"}
Applicability: |
|
EXTERNAL_WEIGHT |
Whether to externalize the weights of the Const/Constant nodes on the original network and convert the node type to FileConstant when the OM model file is generated. In the offline scenario, if the model weight is large and the environment has restrictions on the OM offline model file size, you are advised to enable the external weight and save the weight separately to reduce the OM file size. Arguments:
Configuration example: {ge::ir_option::EXTERNAL_WEIGHT, "1"}
Restrictions:
Applicability: |
Dynamic Shape
Operator and Graph Build
Debugging
Precision Tuning
Precision Comparison
Performance Tuning
|
Parameter |
Description |
|---|---|
|
OP_PRECISION_MODE |
Precision mode of one or more specified operators during internal processing. This parameter is used to transfer the customized precision mode configuration file op_precision.ini to set different precision modes for different operators. The following precision modes can be set in the configuration file:
You can view the precision or performance mode supported by an operator in the opp/built-in/op_impl/ai_core/tbe/impl_mode/all_ops_impl_mode.ini file in the file storage path with the CANN software installed. Sample: Set the precision mode based on the operator type (low priority) or node name (high priority) in each row in the INI file. [ByOpType] optype1=high_precision optype2=high_performance optype3=enable_hi_float_32_execution optype4=support_out_of_bound_index [ByNodeName] nodename1=high_precision nodename2=high_performance nodename3=enable_hi_float_32_execution nodename4=support_out_of_bound_index Restrictions:
Applicability: |
|
TILING_SCHEDULE_OPTIMIZE |
Whether to enable the optimization for tiling offload scheduling. As internal storage of the AI Cores in the NPU cannot store all the input and output data of operators, the input data is tiled into different parts. The first part is transferred in, computed, and then transferred out, so does the next part. This process is called tiling. Then, a computation program, called tiling implementation, determines tiling parameters (such as the block size transferred each time and the total number of cycles) based on operator information such as shape. The AI Core is not good at scalar computation in the tiling implementation. Therefore, tiling implementation is generally executed on the CPU on the host. However, tiling implementation is executed on the device when the following conditions are met:
Arguments:
Configuration example: {ge::ir_option::TILING_SCHEDULE_OPTIMIZE, "1"}
Applicability: |
AOE
|
Parameter |
Description |
|---|---|
|
MDL_BANK_PATH |
Path of the custom repository generated after subgraph tuning This parameter must be used together with BUFFER_OPTIMIZE in aclgrphBuildInitialize Configuration Parameters and takes effect only when buffer optimization is enabled to improve performance by temporarily storing data in the buffer at a high speed. Argument: path of the custom repository generated after model tuning. Format: The path can contain letters (a–z, A–Z), digits (0-9), underscores (_), hyphens (-), and periods (.). Default: $HOME/Ascend/latest/data/aoe/custom/graph/<soc_version> Configuration example: {ge::ir_option::MDL_BANK_PATH, "$HOME/custom_module_path"}
Restrictions: Path (path of the custom repository generated after subgraph tuning) priority ranked from high to low: path specified by MDL_BANK_PATH > path specified by the TUNE_BANK_PATH environment variable > default path.
Applicability: |
|
OP_BANK_PATH |
Path of the custom repository generated after operator tuning. Format: The path can contain letters (a–z, A–Z), digits (0–9), underscores (_), hyphens (-), and periods (.). Default: ${HOME}/Ascend/latest/data/aoe/custom/op Configuration example: {ge::ir_option::OP_BANK_PATH, "$HOME/custom_tune_path"}
Restrictions: Path (path of the custom repository generated after operator tuning) priority ranked from high to low: path specified by the TUNE_BANK_PATH environment variable > path specified by OP_BANK_PATH > default path of the custom repository generated after operator tuning.
Applicability: |
Experiment Parameters
|
Parameter |
Description |
|---|---|
|
ALLOW_HF32 |
This parameter is reserved and is not supported in the current version. Whether to enable the function of automatically replacing the float32 data type with the HF32 data type. In the current version, this option takes effect only for Conv and Matmul operators. HF32 is a single-precision floating-point type developed by Ascend for internal computation of operators. The following shows the comparison with other common data types. HF32 shares the value range with float32, but its mantissa precision (11 bits) is close to FP16 (10 bits). Replacing the original float32 single-precision data type with the HF32 single-precision data type by precision reduction can greatly reduce the space occupied by data and improve performance. Arguments:
Default: Enable FP32-to-HF32 conversion for Conv operators; disable FP32-to-HF32 conversion for Matmul operators. Restrictions:
Configuration example: {ge::ir_option::ALLOW_HF32, "true"}
Applicability: |
|
BUILD_INNER_MODEL |
Not supported in the current version. |
|
OO_LEVEL |
Extended option for debugging. It cannot be used in commercial products and will be released as a formal function in later versions. Multi-level optimization options for graph build include subgraph optimization, entire graph optimization, and static shape model offloading. Static shape model offloading: In this approach, the input and output shapes of all operators in a static shape model can be determined at build time, allowing for model-level memory orchestration and operator tiling computation to be completed on the host. These computations are then batched and sent to the device stream when the model is loaded, but they are not executed immediately. Instead, the execution of all tasks within the model is triggered by the delivery of model execution tasks. Arguments:
Restrictions: If the value is O1, all graph fusion and UB fusion passes are disabled, and only passes related to static offloading are enabled. However, the graph fusion passes in the following files are enabled by default because function problems may occur if they are disabled: All graph fusion passes under the ExceptionalPassOfO1Level field in the ${INSTALL_DIR}/<arch>-linux/lib64/plugin/opskernel/fusion_pass/config/fusion_config.json file Replace ${INSTALL_DIR} with the CANN component directory. For example, if the installation is performed by the root user, the default file storage path is /usr/local/Ascend/cann.<arch> indicates the OS architecture. Configuration example: {ge::ir_option::OO_LEVEL, "O3"}
Applicability: |
|
OO_CONSTANT_FOLDING |
Extended option for debugging. It cannot be used in commercial products and will be released as a formal function in later versions. Whether to enable constant folding optimization. Constant folding is the process of replacing nodes that can be evaluated to a constant output value in a computational graph with that constant, and simplifying the structure of the computational graph accordingly. Arguments:
Configuration example: {ge::ir_option::OO_CONSTANT_FOLDING, "true"}
Applicability: |
|
OO_DEAD_CODE_ELIMINATION |
Extended option for debugging. It cannot be used in commercial products and will be released as a formal function in later versions. Whether to enable dead-edge elimination optimization. Dead-edge elimination (switch dead-edge elimination): When pred (input 1) of a switch statement is a constant node, one of the branches can be eliminated based on the value of const. If const is true, the false branch is eliminated; if const is false, the true branch is eliminated. Arguments:
Configuration example: {ge::ir_option::OO_DEAD_CODE_ELIMINATION, "true"}
Applicability: |
Parameters That Will Be Deprecated in Later Versions
|
Parameter |
Description |
|---|---|
|
INPUT_SHAPE_RANGE |
This parameter is deprecated. Avoid using it. To specify the shape range of the input data of a model, use INPUT_SHAPE. Shape range of the input data of a model. This parameter is mutually exclusive with DYNAMIC_BATCH_SIZE, DYNAMIC_IMAGE_SIZE, and DYNAMIC_DIMS.
Applicability: |
|
SHAPE_GENERALIZED_BUILD_MODE |
Shape build mode during graph build. This parameter will be deprecated in later versions. Do not use this parameter for new functions.
Applicability: |