--op_debug_config
Description
Sets the directory (including the file name) of the configuration file for enabling global memory (DDR) detection.
See Also
None
Argument
Argument: Directory of the configuration file, including the file name.
Format: The directory (including the file name) can contain letters, digits, underscores (_), hyphens (-), periods (.), and Chinese characters.
Restrictions:
The configuration file supports the following options. Multiple options when used should be separated with commas (,).
- oom: Checks whether memory overwriting occurs in the global memory during operator execution.
- Configuring this option retains the binary operator file (.o) and operator description file (.json) in the kernel_meta folder under the current execution directory during operator build.
- If this option is used, the following detection logic is added during operator build. You can use the dump_cce option to view the following code in the generated .cce file:
inline __aicore__ void CheckInvalidAccessOfDDR(xxx) { if (access_offset < 0 || access_offset + access_extent > ddr_size) { if (read_or_write == 1) { trap(0X5A5A0001); } else { trap(0X5A5A0002); } } }During actual execution, if memory overwriting occurs, the error code EZ9999 is reported.
- dump_bin: Retains the binary operator file (.o) and operator description file (.json) in the kernel_meta folder under the current execution directory during operator build.
- dump_cce: Retains the operator CCE file (.cce), binary operator file (.o), and operator description file (.json) in the kernel_meta folder under the current execution directory during operator build.
- dump_loc: Retains the Python-CCE mapping file (*_loc.json) in the kernel_meta folder under the current execution directory during operator build.
- ccec_O0: Enables the CCEC option -O0 during operator build. This option does not optimize the debugging information for later analysis of AI Core errors.
- ccec_g: Enables the CCEC option -g during operator build. This option optimizes the debugging information for later analysis of AI Core errors.
- check_flag: Checks whether pipeline synchronization signals in operators match each other during operator execution.
- Configuring this option retains the binary operator file (.o) and operator description file (.json) in the kernel_meta folder under the current execution directory during operator build.
- If this option is used, the following detection logic is added during operator build. You can use the dump_cce option to view the following code in the generated .cce file:
set_flag(PIPE_MTE3, PIPE_MTE2, EVENT_ID0); set_flag(PIPE_MTE3, PIPE_MTE2, EVENT_ID1); set_flag(PIPE_MTE3, PIPE_MTE2, EVENT_ID2); set_flag(PIPE_MTE3, PIPE_MTE2, EVENT_ID3); .... pipe_barrier(PIPE_MTE3); pipe_barrier(PIPE_MTE2); pipe_barrier(PIPE_M); pipe_barrier(PIPE_V); pipe_barrier(PIPE_MTE1); pipe_barrier(PIPE_ALL); wait_flag(PIPE_MTE3, PIPE_MTE2, EVENT_ID0); wait_flag(PIPE_MTE3, PIPE_MTE2, EVENT_ID1); wait_flag(PIPE_MTE3, PIPE_MTE2, EVENT_ID2); wait_flag(PIPE_MTE3, PIPE_MTE2, EVENT_ID3); ...
During actual inference, if the pipeline synchronization signals in operators do not match each other, a timeout error is reported at the faulty operator, and the program is terminated. The following is an example of the error message:
Aicore kernel execute failed, ..., fault kernel_name=operator name,... rtStreamSynchronizeWithTimeout execute failed....
- When ccec_O0 and ccec_g are enabled, the size of the operator kernel file (*.o file) increases. In the dynamic shape scenario, all possible shape scenarios are traversed during operator build, which may cause operator build failures due to large operator kernel files. In this case, do not enable the CCE compiler options.
If a build failure is caused by the large operator kernel file, the following log is displayed:
message:link error ld.lld: error: InputSection too large for range extension thunk ./kernel_meta_xxxxx.o:
- The ccec_O0 and oom options of the CCEC cannot be both enabled. Otherwise, an AI Core error may be reported. The following is an example of the error message:
...there is an aivec error exception, core id is 49, error code = 0x4 ...
- If the NPU_COLLECT_PATH environment variable is configured, the function of checking whether global memory overwriting occurs cannot be enabled (--op_debug_config is set to oom). Otherwise, an error is reported when the compiled model file or operator kernel package is used.
- When the build options oom, dump_bin, dump_cce, and dump_loc are configured, if the model contains the following MC2 operators, the *.o, *.json, and *.cce files of the operators are not generated in the kernel_meta directory.
MatMulAllReduce
MatMulAllReduceAddRmsNorm
AllGatherMatMul
MatMulReduceScatter
AlltoAllAllGatherBatchMatMul
BatchMatMulReduceScatterAlltoAll
Suggestions and Benefits
None
Example
Assume that the configuration file for enabling global memory detection is gm_debug.cfg.
op_debug_config=ccec_g,oom
Upload the file to any directory (for example, $HOME/module) on the server where ATC is located.
--op_debug_config=$HOME/module/gm_debug.cfg
Restrictions
During operator compilation, if you want to compile only some instead of all AI Core operators, you need to add the op_debug_list field to the gm_debug.cfg configuration file. By doing so, only the operators specified in the list are compiled, based on the options configured in op_debug_config. The op_debug_list field has the following requirements:
- The operator name or operator type can be specified.
- Operators are separated by commas (,). The operator type is configured in OpType::typeName format. The operator type and operator name can be configured in a mixed manner.
- The operator to be compiled must be stored in the configuration file specified by op_debug_config. The operator type must be the Ascend IR–defined operator. For details about how to view the operator type, see How Do I Determine the Mapping Between Operators in the Original Network Model and Operators Supported by Ascend AI Processors?.
A configuration example is provided as follows:
Add the following information to the configuration file (for example, gm_debug.cfg) specified by op_debug_config:
op_debug_config=ccec_g,oom op_debug_list=GatherV2,opType::ReduceSum
Upload the file to any directory (for example, $HOME/module) on the server where ATC is located.
--op_debug_config=$HOME/module/gm_debug.cfg
During actual model conversion, the GatherV2,ReduceSum operator is compiled based on the ccec_g and oom options.