DumpTensor
Supported Products
Product |
Supported/Unsupported |
|---|---|
√ |
|
√ |
|
√ |
|
√ |
|
x |
|
x |
Functions
Dumps the content of specified tensors for operators developed based on operator projects and supports the printing of user-defined additional information (limited to the uint32_t data type), for example, the current line number.
1 | AscendC::DumpTensor(srcLocal, 5, dataLen); |
The printing function of DumpTensor affects the actual running performance of the operator. Therefore, this function is usually used in the debugging phase. You can disable the printing function by setting ASCENDC_DUMP to 0.
The following is an example:
1 2 3 4 5 6 7 8 9 10 | DumpTensor: desc=5, addr=0, data_type=float16, position=UB, dump_size=32 [19.000000, 4.000000, 38.000000, 50.000000, 39.000000, 67.000000, 84.000000, 98.000000, 21.000000, 36.000000, 18.000000, 46.000000, 10.000000, 92.000000, 26.000000, 38.000000, 39.000000, 9.000000, 82.000000, 37.000000, 35.000000, 65.000000, 97.000000, 59.000000, 89.000000, 63.000000, 70.000000, 57.000000, 35.000000, 3.000000, 16.000000, 42.000000] DumpTensor: desc=5, addr=100, data_type=float16, position=UB, dump_size=32 [6.000000, 34.000000, 52.000000, 38.000000, 73.000000, 38.000000, 35.000000, 14.000000, 67.000000, 62.000000, 30.000000, 49.000000, 86.000000, 37.000000, 84.000000, 18.000000, 38.000000, 18.000000, 44.000000, 21.000000, 86.000000, 99.000000, 13.000000, 79.000000, 84.000000, 9.000000, 48.000000, 74.000000, 52.000000, 99.000000, 80.000000, 53.000000] ... DumpTensor: desc=5, addr=0, data_type=float16, position=UB, dump_size=32 [35.000000, 41.000000, 41.000000, 22.000000, 84.000000, 49.000000, 60.000000, 0.000000, 90.000000, 14.000000, 67.000000, 80.000000, 16.000000, 46.000000, 16.000000, 83.000000, 6.000000, 70.000000, 97.000000, 28.000000, 97.000000, 62.000000, 80.000000, 22.000000, 53.000000, 37.000000, 23.000000, 58.000000, 65.000000, 28.000000, 4.000000, 29.000000] |
Prototype
- Printing without tensor shape
1 2 3 4
template <typename T> __aicore__ inline void DumpTensor(const LocalTensor<T> &tensor, uint32_t desc, uint32_t dumpSize) template <typename T> __aicore__ inline void DumpTensor(const GlobalTensor<T>& tensor, uint32_t desc, uint32_t dumpSize)
- Printing with tensor shape
1 2 3 4
template <typename T> __aicore__ inline void DumpTensor(const LocalTensor<T>& tensor, uint32_t desc, uint32_t dumpSize, const ShapeInfo& shapeInfo) template <typename T> __aicore__ inline void DumpTensor(const GlobalTensor<T>& tensor, uint32_t desc, uint32_t dumpSize, const ShapeInfo& shapeInfo)
Parameters
Parameter |
Description |
|---|---|
T |
Data type of the tensor to be dumped. |
Parameter |
Input/Output |
Description |
|---|---|---|
tensor |
Input |
Tensor to be dumped.
|
desc |
Input |
User-defined additional information (line numbers or other user-defined numbers). When using the DumpTensor function, you can use the desc parameter to add user-defined information to distinguish the source of the dump content in different calling scenarios. This function helps accurately locate the output of DumpTensor, improving debugging and analysis efficiency. |
dumpSize |
Input |
Number of elements to be dumped. |
shapeInfo |
Input |
Shape information of the tensor, which can be printed.
|
Returns
None
Constraints
- This function is used only for debugging on the NPU board.
- Currently, printing in the operator graph input scenario is not supported.
- Currently, only the information about the tensors whose storage location is Unified Buffer, L1 Buffer, L0C Buffer, or Global Memory can be printed.
- For details about the operand address alignment requirements, see General Address Alignment Restrictions.
- The total amount of data printed by calling DumpTensor at a time cannot exceed 1 MB (including a small amount of header and tail information required by the framework, which can be ignored). Note that if the limit is exceeded, the data will not be printed.
- During data volume calculation, if the total length of the dump data is not aligned, the impact of padding data needs to be considered. During unaligned dump, if the length of the dumped elements is not 32-byte aligned, the system automatically adds a certain amount of padding data to the end of the elements to meet the alignment requirement. For example, if the length of the element to be dumped in Tensor1 is 30 bytes, the system adds 2-byte padding to the end of the element to align the total length to 32 bytes. However, only the original 30-byte data is parsed, and the padding part is not used.
- When a custom operator project is used for operator development, the API output is different from the preceding description.
During dump, the corresponding information header DumpHead is added before the dump information of each block core to record the core ID and resource usage information. The information header DumpTensorHead is also added before the tensor data to be dumped each time to record tensor information. The information structure in the multi-core printing scenario is illustrated in the figure below.

The specific DumpHead information is as follows:
- opType: type of the running operator;
- CoreType: type of the running core;
- block dim: number of operator execution cores set by the developer;
- total_block_num: number of cores involved in dump;
- block_remain_len: available dump space in the current core.
- block_initial_space: initial dump space allocated in the current core.
- rsv: reserved field
- magic: magic number for memory verification.
During DumpHead printing, the type of the running core and the corresponding core index (for example, AIV-0) are automatically printed in addition to the preceding information.
The specific DumpTensorHead information is as follows:
- desc: user-defined additional information.
- addr: tensor address.
- data_type: tensor data type.
- position: physical storage position of a tensor. Currently, only Unified Buffer, L1 Buffer, L0C Buffer, and Global Memory are supported.
- dump_size: number of elements to be dumped.
The values of CANN_VERSION_STR and CANN_TIMESTAMP are automatically printed at the beginning of the DumpTensor result. CANN_VERSION_STR and CANN_TIMESTAMP are macro definitions. CANN_VERSION_STR indicates the version number of the CANN package in the form of a string. CANN_TIMESTAMP indicates the timestamp when the CANN package is released, the value is in the format of uint64_t. You can directly use the two macros in the code.
The following is an example:
1 2 3 4 5 6 7 8 9 10 11 12
opType=AddCustom, DumpHead: AIV-0, CoreType=AIV, block dim=8, total_block_num=8, block_remain_len=1046912, block_initial_space=1048576, rsv=0, magic=5aa5bccd CANN Version: XX.XX, TimeStamp: XXXXXX DumpTensor: desc=5, addr=0, data_type=float16, position=UB, dump_size=32 [19.000000, 4.000000, 38.000000, 50.000000, 39.000000, 67.000000, 84.000000, 98.000000, 21.000000, 36.000000, 18.000000, 46.000000, 10.000000, 92.000000, 26.000000, 38.000000, 39.000000, 9.000000, 82.000000, 37.000000, 35.000000, 65.000000, 97.000000, 59.000000, 89.000000, 63.000000, 70.000000, 57.000000, 35.000000, 3.000000, 16.000000, 42.000000] DumpTensor: desc=5, addr=100, data_type=float16, position=UB, dump_size=32 [6.000000, 34.000000, 52.000000, 38.000000, 73.000000, 38.000000, 35.000000, 14.000000, 67.000000, 62.000000, 30.000000, 49.000000, 86.000000, 37.000000, 84.000000, 18.000000, 38.000000, 18.000000, 44.000000, 21.000000, 86.000000, 99.000000, 13.000000, 79.000000, 84.000000, 9.000000, 48.000000, 74.000000, 52.000000, 99.000000, 80.000000, 53.000000] ... DumpTensor: desc=5, addr=0, data_type=float16, position=UB, dump_size=32 [35.000000, 41.000000, 41.000000, 22.000000, 84.000000, 49.000000, 60.000000, 0.000000, 90.000000, 14.000000, 67.000000, 80.000000, 16.000000, 46.000000, 16.000000, 83.000000, 6.000000, 70.000000, 97.000000, 28.000000, 97.000000, 62.000000, 80.000000, 22.000000, 53.000000, 37.000000, 23.000000, 58.000000, 65.000000, 28.000000, 4.000000, 29.000000]
This API uses the dump function. The total size of dump data (including the header) of all APIs that use the dump function for an operator on each core cannot exceed 1 MB. You need to control the amount of data to be printed. If the limit is exceeded, no content will be printed.
Examples
- Printing without tensor shape
1AscendC::DumpTensor(srcLocal, 5, dataLen);
- Printing with tensor shape
1 2 3 4 5 6 7 8 9 10 11
uint32_t array[] = {static_cast<uint32_t>(8), static_cast<uint32_t>(8)}; AscendC::ShapeInfo shapeInfo(2, array); // Set dim to 2 and shape to (8,8). AscendC::DumpTensor(x, 2, 64, shapeInfo); // Dump 64 elements of x, which are parsed and arranged based on (8,8) of shapeInfo. uint32_t array1[] = {static_cast<uint32_t>(7), static_cast<uint32_t>(8)}; AscendC::ShapeInfo shapeInfo1(2, array1); // dim is 2, and shape is (7, 8). AscendC::DumpTensor(x1, 3, 64, shapeInfo1); // When the shape size is less than or equal to the number of elements in dumpSize, the elements are printed according to the shapeInfo. The extra dump data is not displayed. uint32_t array2[] = {static_cast<uint32_t>(9), static_cast<uint32_t>(8)}; AscendC::ShapeInfo shapeInfo2(2, array2); // dim is 2, and shape is (9, 8). AscendC::DumpTensor(x2, 4, 64, shapeInfo2); // When the shape size is greater than the number of elements in dumpSize, the elements are printed according to the shapeInfo. The missing dump data is displayed as "-".
Information similar to the following is displayed:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
DumpTensor: desc=2, addr=xxxx, data_type=float16, position=UB, dump_size=64 [[150.000000,83.000000,109.000000,166.000000,129.000000,50.000000,150.000000,74.000000], [135.000000,79.000000,98.000000,134.000000,146.000000,166.000000,112.000000,70.000000], [122.000000,51.000000,116.000000,68.000000,172.000000,72.000000,102.000000,69.000000], [136.000000,83.000000,88.000000,88.000000,112.000000,148.000000,79.000000,136.000000], [133.000000,104.000000,83.000000,71.000000,83.000000,99.000000,103.000000,151.000000], [98.000000,118.000000,128.000000,83.000000,25.000000,105.000000,179.000000,34.000000], [104.000000,169.000000,115.000000,113.000000,134.000000,121.000000,88.000000,96.000000], [29.000000,139.000000,70.000000,40.000000,158.000000,138.000000,72.000000,171.000000]] DumpTensor: desc=3, addr=xxxx, data_type=float16, position=UB, dump_size=64 shape is [7, 8], dumpSize is 64, dumpSize is greater than shapeSize. [[82.250000,37.312500,22.843750,91.937500,93.312500,77.125000,50.718750,27.171875], [21.859375,32.906250,20.109375,70.875000,13.398438,14.562500,30.156250,52.562500], [40.156250,45.781250,78.937500,65.687500,71.562500,61.375000,32.062500,80.750000], [55.593750,44.031250,43.781250,3.132812,38.750000,50.968750,79.562500,80.562500], [51.562500,22.468750,88.250000,20.578125,95.437500,83.562500,76.812500,34.281250], [75.500000,47.875000,52.562500,74.937500,39.687500,90.062500,28.890625,10.593750], [42.343750,67.062500,35.468750,60.875000,71.812500,81.562500,57.531250,62.500000]] DumpTensor: desc=4, addr=xxxx, data_type=float16, position=UB, dump_size=64 shape is [9, 8], dumpSize is 64, data is not enough. [[95.437500,59.250000,57.281250,27.093750,41.375000,48.375000,33.093750,91.312500], [27.703125,60.718750,68.187500,70.875000,67.437500,84.562500,13.507812,4.550781], [24.500000,73.437500,36.062500,68.437500,55.500000,95.375000,60.250000,64.750000], [40.093750,85.000000,42.250000,39.531250,60.968750,8.953125,48.531250,53.906250], [53.656250,64.187500,84.750000,22.250000,95.500000,39.937500,12.945312,54.031250], [3.804688,98.187500,43.968750,26.000000,41.750000,34.500000,75.750000,89.625000], [25.046875,5.265625,65.500000,45.468750,32.937500,8.593750,1.705078,12.742188], [37.281250,95.125000,71.562500,27.515625,47.250000,36.312500,66.750000,31.250000], [-,-,-,-,-,-,-,-]]