DumpTensor

Product Support

Product	Supported
Atlas A3 training products/Atlas A3 inference products	√
Atlas A2 training products/Atlas A2 inference products	√
Atlas 200I/500 A2 inference products	√
Atlas inference product's AI Core	√
Atlas inference product's Vector Core	x
Atlas training products	x

Function

Dumps the content of specified tensors for operators developed based on operator projects and supports the printing of user-defined additional information (limited to the uint32_t data type), for example, the current line number.

Call the DumpTensor API to print tensor data at the target position in the operator kernel implementation code. An example is as follows.

AscendC::DumpTensor(srcLocal, 5, dataLen);

The print function of the DumpTensor API impacts operator runtime performance, so it is generally used for debugging. You can disable the print function by setting ASCENDC_DUMP=0.

The following is an example:

DumpTensor: desc=5, addr=0, data_type=float16, position=UB, dump_size=32
[19.000000, 4.000000, 38.000000, 50.000000, 39.000000, 67.000000, 84.000000, 98.000000, 21.000000, 36.000000, 18.000000, 46.000000, 10.000000, 92.000000, 26.000000, 38.000000, 39.000000, 9.000000, 82.000000, 37.000000, 35.000000, 65.000000, 97.000000, 59.000000, 89.000000, 63.000000, 70.000000, 57.000000, 35.000000, 3.000000, 16.000000,
42.000000]
DumpTensor: desc=5, addr=100, data_type=float16, position=UB, dump_size=32
[6.000000, 34.000000, 52.000000, 38.000000, 73.000000, 38.000000, 35.000000, 14.000000, 67.000000, 62.000000, 30.000000, 49.000000, 86.000000, 37.000000, 84.000000, 18.000000, 38.000000, 18.000000, 44.000000, 21.000000, 86.000000, 99.000000, 13.000000, 79.000000, 84.000000, 9.000000, 48.000000, 74.000000, 52.000000, 99.000000, 80.000000,
53.000000]
...
DumpTensor: desc=5, addr=0, data_type=float16, position=UB, dump_size=32
[35.000000, 41.000000, 41.000000, 22.000000, 84.000000, 49.000000, 60.000000, 0.000000, 90.000000, 14.000000, 67.000000, 80.000000, 16.000000, 46.000000, 16.000000, 83.000000, 6.000000, 70.000000, 97.000000, 28.000000, 97.000000, 62.000000, 80.000000, 22.000000, 53.000000, 37.000000, 23.000000, 58.000000, 65.000000, 28.000000, 4.000000,
29.000000]

Prototype

Printing without tensor shape

template <typename T>
__aicore__ inline void DumpTensor(const LocalTensor<T> &tensor, uint32_t desc, uint32_t dumpSize)
template <typename T>
__aicore__ inline void DumpTensor(const GlobalTensor<T>& tensor, uint32_t desc, uint32_t dumpSize)

Printing with tensor shape

template <typename T>
__aicore__ inline void DumpTensor(const LocalTensor<T>& tensor, uint32_t desc, uint32_t dumpSize, const ShapeInfo& shapeInfo)
template <typename T>
__aicore__ inline void DumpTensor(const GlobalTensor<T>& tensor, uint32_t desc, uint32_t dumpSize, const ShapeInfo& shapeInfo)

Parameters

**Table 1** Template parameters
Parameter	Description
T	Data type of a tensor to be dumped. For Atlas A3 training products/Atlas A3 inference products, the supported data types are bool, uint8_t, int8_t, int16_t, uint16_t, int32_t, uint32_t, int64_t, uint64_t, float, half, and bfloat16_t. For Atlas A2 training products/Atlas A2 inference products, the supported data types are bool, uint8_t, int8_t, int16_t, uint16_t, int32_t, uint32_t, int64_t, uint64_t, float, half, and bfloat16_t. For Atlas 200I/500 A2 inference products, the supported data types are bool, uint8_t, int8_t, int16_t, uint16_t, int32_t, uint32_t, int64_t, uint64_t, float, and half. For the Atlas inference product's AI Core, the supported data types are bool, uint8_t, int8_t, int16_t, uint16_t, int32_t, uint32_t, int64_t, uint64_t, float, and half.

**Table 2** Parameters
Parameter	Input/Output	Description
tensor	Input	Tensor to be dumped. If the tensor to be dumped is stored in Unified Buffer/L1 Buffer/L0C Buffer, use the tensor parameter input of the LocalTensor type. If the tensor to be dumped is stored in Global Memory, use the tensor parameter input of the GlobalTensor type.
desc	Input	User-defined additional information (line numbers or other user-defined numbers). When using the DumpTensor function, you can attach custom information via the desc parameter to identify the source of dumped content across different calling scenarios. This helps precisely locate DumpTensor outputs and improve debugging and analysis efficiency.
dumpSize	Input	Number of elements to be dumped.
shapeInfo	Input	Shape information of the tensor, which can be printed. When the shape size exceeds the number of elements in dumpSize, elements are printed according to ShapeInfo, and insufficient dump data is displayed with "-". When the shape size is less than or equal to the number of elements in dumpSize, elements are printed according to ShapeInfo, and the extra dump data is not displayed.

Returns

None

Restrictions

This function is used only for board debugging on the NPU.
Currently, printing is not supported in the scenario of operator integration into a graph.
Currently, only information about tensors stored in Unified Buffer/L1 Buffer/L0C Buffer/Global Memory can be printed.
For details about the operand address alignment requirements, see General Address Alignment Restrictions.
The total amount of data printed by a single DumpTensor call cannot exceed 1 MB (including a small amount of header and footer information required by the framework, which can usually be ignored). If this limit is exceeded, no data will be printed.
During data volume calculation, if the total length of the dump data is not aligned, the impact of padding data needs to be considered. During unaligned dump, if the length of the dumped elements is not 32-byte aligned, the system automatically appends padding data at the tail to meet the alignment requirement. For example, if elements to be dumped in tensor 1 occupy 30 bytes, the system adds 2-byte padding data to align the total length to 32 bytes. However, only the original 30-byte data is parsed, and the padding part is not used.

When a custom operator project is used for operator development, the API output is different from the preceding description.

During dump, the corresponding information header DumpHead is added before the dump information of each block core to record the core ID and resource usage information. The information header DumpTensorHead is also added before the tensor data to be dumped each time to record tensor information. The information structure in the multi-core printing scenario is illustrated in the figure below.

The specific DumpHead information is as follows:

opType: type of the running operator.
CoreType: type of the running core.
block dim: number of cores for operator execution set by the developer.
total_block_num: number of cores to be dumped.
block_remain_len: available dump space in the current core.
block_initial_space: initial dump space allocated in the current core.
rsv: reserved field.
magic: magic number for memory verification.

During DumpHead printing, the type of the running core and the corresponding core index (for example, AIV-0) are automatically printed in addition to the preceding information.

The specific DumpTensorHead information is as follows:

desc: user-defined additional information.
addr: tensor address.
data_type: tensor data type.
position: physical storage position of the tensor, which can only be Unified Buffer/L1 Buffer/L0C Buffer/Global Memory.
dump_size: number of elements to be dumped.

The values of CANN_VERSION_STR and CANN_TIMESTAMP are automatically printed at the beginning of the DumpTensor printing result. CANN_VERSION_STR and CANN_TIMESTAMP are macro definitions. CANN_VERSION_STR indicates the version number of the CANN package in the form of a string. CANN_TIMESTAMP indicates the timestamp when the CANN package is released, the value is in the format of uint64_t. You can directly use the two macros in the code.

The following is an example:

opType=AddCustom, DumpHead: AIV-0, CoreType=AIV, block dim=8, total_block_num=8, block_remain_len=1046912, block_initial_space=1048576, rsv=0, magic=5aa5bccd
CANN Version: XX.XX, TimeStamp: XXXXXX
DumpTensor: desc=5, addr=0, data_type=float16, position=UB, dump_size=32
[19.000000, 4.000000, 38.000000, 50.000000, 39.000000, 67.000000, 84.000000, 98.000000, 21.000000, 36.000000, 18.000000, 46.000000, 10.000000, 92.000000, 26.000000, 38.000000, 39.000000, 9.000000, 82.000000, 37.000000, 35.000000, 65.000000, 97.000000, 59.000000, 89.000000, 63.000000, 70.000000, 57.000000, 35.000000, 3.000000, 16.000000,
42.000000]
DumpTensor: desc=5, addr=100, data_type=float16, position=UB, dump_size=32
[6.000000, 34.000000, 52.000000, 38.000000, 73.000000, 38.000000, 35.000000, 14.000000, 67.000000, 62.000000, 30.000000, 49.000000, 86.000000, 37.000000, 84.000000, 18.000000, 38.000000, 18.000000, 44.000000, 21.000000, 86.000000, 99.000000, 13.000000, 79.000000, 84.000000, 9.000000, 48.000000, 74.000000, 52.000000, 99.000000, 80.000000,
53.000000]
...
DumpTensor: desc=5, addr=0, data_type=float16, position=UB, dump_size=32
[35.000000, 41.000000, 41.000000, 22.000000, 84.000000, 49.000000, 60.000000, 0.000000, 90.000000, 14.000000, 67.000000, 80.000000, 16.000000, 46.000000, 16.000000, 83.000000, 6.000000, 70.000000, 97.000000, 28.000000, 97.000000, 62.000000, 80.000000, 22.000000, 53.000000, 37.000000, 23.000000, 58.000000, 65.000000, 28.000000, 4.000000,
29.000000]

This API uses the dump function. The total amount of data (including the information header) dumped by all APIs that use the dump function for an operator on each core cannot exceed 1 MB. You need to control the amount of data to be printed. If the limit is exceeded, no content will be printed.

Example

Printing without tensor shape

AscendC::DumpTensor(srcLocal, 5, dataLen);

Printing with tensor shape

uint32_t array[] = {static_cast<uint32_t>(8), static_cast<uint32_t>(8)};
AscendC::ShapeInfo shapeInfo(2, array);       // Set dim to 2 and shape to (8,8).
AscendC::DumpTensor(x, 2, 64, shapeInfo);     // Dump 64 elements of x, which are parsed and arranged based on (8,8) of shapeInfo.

uint32_t array1[] = {static_cast<uint32_t>(7), static_cast<uint32_t>(8)};
AscendC::ShapeInfo shapeInfo1(2, array1); // dim is 2, and shape is (7, 8).
AscendC::DumpTensor(x1, 3, 64, shapeInfo1); // When the shape size is less than or equal to the number of elements in dumpSize, elements are printed according to ShapeInfo, and the extra dump data is not displayed.

uint32_t array2[] = {static_cast<uint32_t>(9), static_cast<uint32_t>(8)};
AscendC::ShapeInfo shapeInfo2(2, array2); // dim is 2, and shape is (9, 8).
AscendC::DumpTensor(x2, 4, 64, shapeInfo2); // When the shape size exceeds the number of elements in dumpSize, elements are printed according to ShapeInfo, and insufficient dump data is displayed with "-".

Information similar to the following is displayed:

DumpTensor: desc=2, addr=xxxx, data_type=float16, position=UB, dump_size=64
[[150.000000,83.000000,109.000000,166.000000,129.000000,50.000000,150.000000,74.000000],
[135.000000,79.000000,98.000000,134.000000,146.000000,166.000000,112.000000,70.000000],
[122.000000,51.000000,116.000000,68.000000,172.000000,72.000000,102.000000,69.000000],
[136.000000,83.000000,88.000000,88.000000,112.000000,148.000000,79.000000,136.000000],
[133.000000,104.000000,83.000000,71.000000,83.000000,99.000000,103.000000,151.000000],
[98.000000,118.000000,128.000000,83.000000,25.000000,105.000000,179.000000,34.000000],
[104.000000,169.000000,115.000000,113.000000,134.000000,121.000000,88.000000,96.000000],
[29.000000,139.000000,70.000000,40.000000,158.000000,138.000000,72.000000,171.000000]]
DumpTensor: desc=3, addr=xxxx, data_type=float16, position=UB, dump_size=64
shape is [7, 8], dumpSize is 64, dumpSize is greater than shapeSize.
[[82.250000,37.312500,22.843750,91.937500,93.312500,77.125000,50.718750,27.171875],
[21.859375,32.906250,20.109375,70.875000,13.398438,14.562500,30.156250,52.562500],
[40.156250,45.781250,78.937500,65.687500,71.562500,61.375000,32.062500,80.750000],
[55.593750,44.031250,43.781250,3.132812,38.750000,50.968750,79.562500,80.562500],
[51.562500,22.468750,88.250000,20.578125,95.437500,83.562500,76.812500,34.281250],
[75.500000,47.875000,52.562500,74.937500,39.687500,90.062500,28.890625,10.593750],
[42.343750,67.062500,35.468750,60.875000,71.812500,81.562500,57.531250,62.500000]]
DumpTensor: desc=4, addr=xxxx, data_type=float16, position=UB, dump_size=64
shape is [9, 8], dumpSize is 64, data is not enough.
[[95.437500,59.250000,57.281250,27.093750,41.375000,48.375000,33.093750,91.312500],
[27.703125,60.718750,68.187500,70.875000,67.437500,84.562500,13.507812,4.550781],
[24.500000,73.437500,36.062500,68.437500,55.500000,95.375000,60.250000,64.750000],
[40.093750,85.000000,42.250000,39.531250,60.968750,8.953125,48.531250,53.906250],
[53.656250,64.187500,84.750000,22.250000,95.500000,39.937500,12.945312,54.031250],
[3.804688,98.187500,43.968750,26.000000,41.750000,34.500000,75.750000,89.625000],
[25.046875,5.265625,65.500000,45.468750,32.937500,8.593750,1.705078,12.742188],
[37.281250,95.125000,71.562500,27.515625,47.250000,36.312500,66.750000,31.250000],
[-,-,-,-,-,-,-,-]]

Parent topic: Board Printing