DumpTensor
Function Usage
Dumps the content of specified tensors for operators developed based on operator projects and supports the printing of user-defined additional information (limited to the uint32_t data type), for example, the current line number.
1
|
AscendC::DumpTensor(srcLocal,5, dataLen); |
- Custom operator project
Modify the CMakeLists.txt file in the op_kernel directory of the operator project. Add the compilation option -DASCENDC_DUMP=0 to the first line to disable ASCENDC_DUMP. The following is an example.
1 2
// Disable the printf printing function of all operators. add_ops_compile_options(ALL OPTIONS -DASCENDC_DUMP=0)
- Kernel launch project
Modify the npu_lib.cmake file in the cmake directory. Add the -DASCENDC_DUMP=0 macro definition to the ascendc_compile_definitions command to disable the ASCENDC_DUMP function. The following is an example.
1 2 3 4
// Disable the printf printing function of all operators. ascendc_compile_definitions(ascendc_kernels_${RUN_MODE} PRIVATE -DASCENDC_DUMP=0 )
During dump, the corresponding information header DumpHead (32 bytes) is added before the dump information of each block core to record the core ID and resource usage. The information header DumpTensorHead (32 bytes) is also added before the tensor data to be dumped each time to record tensor information. The information structure in the multi-core printing scenario is illustrated in the figure below.

The specific DumpHead information is as follows:
- block_id: ID of the running core.
- total_block_num: number of cores to be dumped.
- block_remain_len: available dump space in the current core.
- block_initial_space: initial dump space allocated in the current core.
- magic: magic number for memory verification.
The specific DumpTensorHead information is as follows:
- desc: user-defined additional information.
- addr: tensor address.
- data_type: tensor data type.
- position: physical storage position of the tensor, which can only be Unified Buffer/L1 Buffer/L0C Buffer/Global Memory.
The values of CANN_VERSION_STR and CANN_TIMESTAMP are automatically printed at the beginning of the DumpTensor result. CANN_VERSION_STR and CANN_TIMESTAMP are macro definitions. CANN_VERSION_STR indicates the version number of the CANN package in the form of a string. CANN_TIMESTAMP indicates the timestamp when the CANN package is released, the value is in the format of uint64_t. You can directly use the two macros in the code.
The following is an example:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
CANN Version: XXX.XX, TimeStamp: 20240807XXXXXXXXX DumpHead: block_id=0, total_block_num=16, block_remain_len=1048448, block_initial_space=1048576, magic=5aa5bccd DumpTensor: desc=5, addr=0, data_type=DT_FLOAT16, position=UB [40, 82, 60, 11, 24, 55, 52, 60, 31, 86, 53, 61, 47, 54, 34, 62, 84, 29, 48, 95, 16, 0, 20, 77, 3, 55, 69, 73, 75, 40, 35, 13] CANN Version: XXX.XX, TimeStamp: 20240807XXXXXXXXX DumpHead: block_id=1, total_block_num=16, block_remain_len=1048448, block_initial_space=1048576, magic=5aa5bccd DumpTensor: desc=5, addr=0, data_type=DT_FLOAT16, position=UB [58, 84, 22, 54, 41, 93, 1, 45, 50, 9, 72, 81, 23, 96, 86, 45, 36, 9, 36, 34, 78, 7, 2, 29, 47, 26, 13, 24, 27, 55, 90, 5] ... CANN Version: XXX.XX, TimeStamp: 20240807XXXXXXXXX DumpHead: block_id=7, total_block_num=16, block_remain_len=1048448, block_initial_space=1048576, magic=5aa5bccd DumpTensor: desc=5, addr=0, data_type=DT_FLOAT16, position=UB [28, 27, 79, 39, 86, 5, 23, 97, 89, 5, 65, 69, 59, 13, 49, 2, 34, 6, 52, 38, 4, 90, 11, 11, 61, 50, 71, 98, 19, 54, 54, 99] |
Prototype
- Printing without tensor shape
1 2
void DumpTensor(const LocalTensor<T> &tensor, uint32_t desc, uint32_t dumpSize) void DumpTensor(const GlobalTensor<T>& tensor, uint32_t desc, uint32_t dumpSize)
- Printing with tensor shape
1 2
void DumpTensor(const LocalTensor<T> &tensor, uint32_t desc, uint32_t dumpNum, const ShapeInfo& shapeInfo) void DumpTensor(const GlobalTensor<T> &tensor, uint32_t desc, uint32_t dumpNum, const ShapeInfo& shapeInfo)
Parameters
|
Parameter |
Input/Output |
Description |
|---|---|---|
|
tensor |
Input |
Tensor to be dumped.
|
|
desc |
Input |
User-defined additional information (line numbers or other user-defined numbers). |
|
dumpSize |
Input |
Number of elements to be dumped. The total length of elements to be dumped must be 32-byte aligned. |
|
shapeInfo |
Input |
Shape information of the tensor, which can be printed. |
Returns
None
Availability
Constraints
- This function is used only for NPU on-board debugging and is supported only in the following scenarios:
- Currently, only information about tensors stored in Unified Buffer/L1 Buffer/L0C Buffer/Global Memory can be printed.
- For details about the alignment requirements of the operand address offset, see General Restrictions.
- The sum size of the space used by printf, assert, DumpAccChkPoint, DumpTensor , and framework dump function cannot exceed 1 MB on each core. Developers need to control the amount of data to be printed. If the limit is exceeded, no content will be printed.
Example
- Printing without tensor shape
1AscendC::DumpTensor(srcLocal,5, dataLen);
- Printing with tensor shape
1 2 3
uint32_t array[] = {static_cast<uint32_t>(8),static_cast<uint32_t>(8)}; AscendC::ShapeInfo shapeInfo(2, array); // Set dim to 2 and shape to (8,8). AscendC::DumpTensor(x, 2, 64, shapeInfo); // Dump 64 elements of x, which are parsed and arranged based on (8,8) of shapeInfo.
Information similar to the following is displayed:
1 2 3 4 5 6 7 8
[[150.000000,83.000000,109.000000,166.000000,129.000000,50.000000,150.000000,74.000000], [135.000000,79.000000,98.000000,134.000000,146.000000,166.000000,112.000000,70.000000], [122.000000,51.000000,116.000000,68.000000,172.000000,72.000000,102.000000,69.000000], [136.000000,83.000000,88.000000,88.000000,112.000000,148.000000,79.000000,136.000000], [133.000000,104.000000,83.000000,71.000000,83.000000,99.000000,103.000000,151.000000], [98.000000,118.000000,128.000000,83.000000,25.000000,105.000000,179.000000,34.000000], [104.000000,169.000000,115.000000,113.000000,134.000000,121.000000,88.000000,96.000000], [29.000000,139.000000,70.000000,40.000000,158.000000,138.000000,72.000000,171.000000]]