DumpAccChkPoint
Function Usage
Dumps the content of specified tensors for operators developed based on operator projects and supports the printing of user-defined additional information (limited to the uint32_t data type), for example, the current line number. Unlike DumpTensor, this API can be used to print tensors with specified offset.
1
|
AscendC::DumpAccChkPoint(srcLocal,5, 32, dataLen); |
- Custom operator project
Modify the CMakeLists.txt file in the op_kernel directory of the operator project. Add the compilation option -DASCENDC_DUMP=0 to the first line to disable ASCENDC_DUMP. The following is an example.
1 2
// Disable the printf printing function of all operators. add_ops_compile_options(ALL OPTIONS -DASCENDC_DUMP=0)
- Kernel launch project
Modify the npu_lib.cmake file in the cmake directory. Add the -DASCENDC_DUMP=0 macro definition to the ascendc_compile_definitions command to disable the ASCENDC_DUMP function. The following is an example.
1 2 3 4
// Disable the printf printing function of all operators. ascendc_compile_definitions(ascendc_kernels_${RUN_MODE} PRIVATE -DASCENDC_DUMP=0 )
During dump, the corresponding information header DumpHead (32 bytes) is added before the dump information of each block core to record the core ID and resource usage. The information header DumpTensorHead (32 bytes) is also added before the tensor data to be dumped each time to record tensor information. The information structure in the multi-core printing scenario is illustrated in the figure below.

The specific DumpHead information is as follows:
- block_id: ID of the running core.
- total_block_num: number of cores to be dumped.
- block_remain_len: available dump space in the current core.
- block_initial_space: initial dump space allocated in the current core.
- magic: magic number for memory verification.
The specific DumpTensorHead information is as follows:
- desc: user-defined additional information.
- addr: tensor address.
- data_type: tensor data type.
- position: physical storage position of the tensor, which can only be Unified Buffer/L1 Buffer/L0C Buffer/Global Memory.
An example of the printing result is as follows:
1 2 3 4 5 6 7 8 9 10 |
DumpHead: block_id=0, total_block_num=16, block_remain_len=1048448, block_initial_space=1048576, magic=5aa5bccd DumpTensor: desc=5, addr=0, data_type=DT_FLOAT16, position=UB [40, 82, 60, 11, 24, 55, 52, 60, 31, 86, 53, 61, 47, 54, 34, 62, 84, 29, 48, 95, 16, 0, 20, 77, 3, 55, 69, 73, 75, 40, 35, 13] DumpHead: block_id=1, total_block_num=16, block_remain_len=1048448, block_initial_space=1048576, magic=5aa5bccd DumpTensor: desc=5, addr=0, data_type=DT_FLOAT16, position=UB [58, 84, 22, 54, 41, 93, 1, 45, 50, 9, 72, 81, 23, 96, 86, 45, 36, 9, 36, 34, 78, 7, 2, 29, 47, 26, 13, 24, 27, 55, 90, 5] ... DumpHead: block_id=7, total_block_num=16, block_remain_len=1048448, block_initial_space=1048576, magic=5aa5bccd DumpTensor: desc=5, addr=0, data_type=DT_FLOAT16, position=UB [28, 27, 79, 39, 86, 5, 23, 97, 89, 5, 65, 69, 59, 13, 49, 2, 34, 6, 52, 38, 4, 90, 11, 11, 61, 50, 71, 98, 19, 54, 54, 99] |
Prototype
1 2 |
void DumpAccChkPoint(const GlobalTensor<T>& tensor, uint32_t index, uint32_t countOff, uint32_t dumpSize) void DumpAccChkPoint(const LocalTensor<T>& tensor, uint32_t index, uint32_t countOff, uint32_t dumpSize) |
Parameters
|
Parameter |
Input/Output |
Description |
|---|---|---|
|
tensor |
Input |
Tensor to be dumped. If the tensor to be dumped is stored in Unified Buffer/L1 Buffer/L0C Buffer, use the tensor parameter input of the LocalTensor type. If the tensor to be dumped is stored in Global Memory, use the tensor parameter input of the GlobalTensor type. |
|
index |
Input |
User-defined additional information (line numbers or other user-defined numbers). |
|
dumpSize |
Input |
Number of elements to be dumped. |
|
countOff |
Input |
Number of offset elements |
Returns
None
Availability
Constraints
- This function is used only for NPU on-board debugging and is supported only in the following scenarios:
- Currently, only information about tensors stored in Unified Buffer/L1 Buffer/L0C Buffer/Global Memory can be printed.
- For details about the alignment requirements of the operand address offset, see General Restrictions.
- The total length of elements to be dumped must be 32-byte aligned.
- The offset must be 32-byte aligned. That is, the number of offset elements multiplied by sizeof(T) must be 32-byte aligned.
- The sum size of the space used by the printf call, assert call, DumpTensor and DumpAccChkPoint call, and framework dump function cannot exceed 1 MB on each core. Developers need to control the amount of data to be printed. If the limit is exceeded, no content will be printed.
Example
1
|
AscendC::DumpAccChkPoint(srcLocal, 7, 32 , 128); |