DumpTensor

Supported Products

Product

Supported/Unsupported

Atlas A3 training products/Atlas A3 inference products

Atlas A2 training products/Atlas A2 inference products

Atlas 200I/500 A2 inference products

Atlas inference product's AI Core

Atlas inference product's Vector Core

x

Atlas training products

x

Functions

Dumps the content of specified tensors for operators developed based on operator projects and supports the printing of user-defined additional information (limited to the uint32_t data type), for example, the current line number.

Call the DumpTensor API to print tensor data at the target position in the operator kernel implementation code. An example is as follows.
1
AscendC::DumpTensor(srcLocal, 5, dataLen);

The printing function of DumpTensor affects the actual running performance of the operator. Therefore, this function is usually used in the debugging phase. You can disable the printing function by setting ASCENDC_DUMP to 0.

The following is an example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
DumpTensor: desc=5, addr=0, data_type=float16, position=UB, dump_size=32
[19.000000, 4.000000, 38.000000, 50.000000, 39.000000, 67.000000, 84.000000, 98.000000, 21.000000, 36.000000, 18.000000, 46.000000, 10.000000, 92.000000, 26.000000, 38.000000, 39.000000, 9.000000, 82.000000, 37.000000, 35.000000, 65.000000, 97.000000, 59.000000, 89.000000, 63.000000, 70.000000, 57.000000, 35.000000, 3.000000, 16.000000,
42.000000]
DumpTensor: desc=5, addr=100, data_type=float16, position=UB, dump_size=32
[6.000000, 34.000000, 52.000000, 38.000000, 73.000000, 38.000000, 35.000000, 14.000000, 67.000000, 62.000000, 30.000000, 49.000000, 86.000000, 37.000000, 84.000000, 18.000000, 38.000000, 18.000000, 44.000000, 21.000000, 86.000000, 99.000000, 13.000000, 79.000000, 84.000000, 9.000000, 48.000000, 74.000000, 52.000000, 99.000000, 80.000000,
53.000000]
...
DumpTensor: desc=5, addr=0, data_type=float16, position=UB, dump_size=32
[35.000000, 41.000000, 41.000000, 22.000000, 84.000000, 49.000000, 60.000000, 0.000000, 90.000000, 14.000000, 67.000000, 80.000000, 16.000000, 46.000000, 16.000000, 83.000000, 6.000000, 70.000000, 97.000000, 28.000000, 97.000000, 62.000000, 80.000000, 22.000000, 53.000000, 37.000000, 23.000000, 58.000000, 65.000000, 28.000000, 4.000000,
29.000000]

Prototype

  • Printing without tensor shape
    1
    2
    3
    4
    template <typename T>
    __aicore__ inline void DumpTensor(const LocalTensor<T> &tensor, uint32_t desc, uint32_t dumpSize)
    template <typename T>
    __aicore__ inline void DumpTensor(const GlobalTensor<T>& tensor, uint32_t desc, uint32_t dumpSize)
    
  • Printing with tensor shape
    1
    2
    3
    4
    template <typename T>
    __aicore__ inline void DumpTensor(const LocalTensor<T>& tensor, uint32_t desc, uint32_t dumpSize, const ShapeInfo& shapeInfo)
    template <typename T>
    __aicore__ inline void DumpTensor(const GlobalTensor<T>& tensor, uint32_t desc, uint32_t dumpSize, const ShapeInfo& shapeInfo)
    

Parameters

Table 1 Parameters in the template

Parameter

Description

T

Data type of the tensor to be dumped.

Atlas A3 training products/Atlas A3 inference products: The supported data types are bool, uint8_t, int8_t, int16_t, uint16_t, int32_t, uint32_t, int64_t, uint64_t, float, half and bfloat16_t.

Atlas A2 training products/Atlas A2 inference products: The supported data types are bool, uint8_t, int8_t, int16_t, uint16_t, int32_t, uint32_t, int64_t, uint64_t, float, half and bfloat16_t.

Atlas 200I/500 A2 inference products: The supported data types are bool, uint8_t, int8_t, int16_t, uint16_t, int32_t, uint32_t, int64_t, uint64_t, float, and half.

Atlas inference product's AI Core: The supported data types are bool, uint8_t, int8_t, int16_t, uint16_t, int32_t, uint32_t, int64_t, uint64_t, float and half.

Table 2 Parameters

Parameter

Input/Output

Description

tensor

Input

Tensor to be dumped.

  • If the tensor to be dumped is stored in Unified Buffer/L1 Buffer/L0C Buffer, use the tensor parameter input of the LocalTensor type.
  • If the tensor to be dumped is stored in Global Memory, use the tensor parameter input of the GlobalTensor type.

desc

Input

User-defined additional information (line numbers or other user-defined numbers).

When using the DumpTensor function, you can use the desc parameter to add user-defined information to distinguish the source of the dump content in different calling scenarios. This function helps accurately locate the output of DumpTensor, improving debugging and analysis efficiency.

dumpSize

Input

Number of elements to be dumped.

shapeInfo

Input

Shape information of the tensor, which can be printed.
  • If the shape size is greater than the number of elements to be dumped, the elements are printed based on the shape information. The missing dump data is displayed as "-".
  • If the shape size is less than or equal to the number of elements to be dumped, the elements are printed based on the shape information. The extra dump data is not displayed.

Returns

None

Constraints

  • This function is used only for debugging on the NPU board.
  • Currently, printing in the operator graph input scenario is not supported.
  • Currently, only the information about the tensors whose storage location is Unified Buffer, L1 Buffer, L0C Buffer, or Global Memory can be printed.
  • For details about the operand address alignment requirements, see General Address Alignment Restrictions.
  • The total amount of data printed by calling DumpTensor at a time cannot exceed 1 MB (including a small amount of header and tail information required by the framework, which can be ignored). Note that if the limit is exceeded, the data will not be printed.
  • During data volume calculation, if the total length of the dump data is not aligned, the impact of padding data needs to be considered. During unaligned dump, if the length of the dumped elements is not 32-byte aligned, the system automatically adds a certain amount of padding data to the end of the elements to meet the alignment requirement. For example, if the length of the element to be dumped in Tensor1 is 30 bytes, the system adds 2-byte padding to the end of the element to align the total length to 32 bytes. However, only the original 30-byte data is parsed, and the padding part is not used.
  • When a custom operator project is used for operator development, the API output is different from the preceding description.

    During dump, the corresponding information header DumpHead is added before the dump information of each block core to record the core ID and resource usage information. The information header DumpTensorHead is also added before the tensor data to be dumped each time to record tensor information. The information structure in the multi-core printing scenario is illustrated in the figure below.

    The specific DumpHead information is as follows:

    • opType: type of the running operator;
    • CoreType: type of the running core;
    • block dim: number of operator execution cores set by the developer;
    • total_block_num: number of cores involved in dump;
    • block_remain_len: available dump space in the current core.
    • block_initial_space: initial dump space allocated in the current core.
    • rsv: reserved field
    • magic: magic number for memory verification.

    During DumpHead printing, the type of the running core and the corresponding core index (for example, AIV-0) are automatically printed in addition to the preceding information.

    The specific DumpTensorHead information is as follows:

    • desc: user-defined additional information.
    • addr: tensor address.
    • data_type: tensor data type.
    • position: physical storage position of a tensor. Currently, only Unified Buffer, L1 Buffer, L0C Buffer, and Global Memory are supported.
    • dump_size: number of elements to be dumped.

    The values of CANN_VERSION_STR and CANN_TIMESTAMP are automatically printed at the beginning of the DumpTensor result. CANN_VERSION_STR and CANN_TIMESTAMP are macro definitions. CANN_VERSION_STR indicates the version number of the CANN package in the form of a string. CANN_TIMESTAMP indicates the timestamp when the CANN package is released, the value is in the format of uint64_t. You can directly use the two macros in the code.

    The following is an example:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    opType=AddCustom, DumpHead: AIV-0, CoreType=AIV, block dim=8, total_block_num=8, block_remain_len=1046912, block_initial_space=1048576, rsv=0, magic=5aa5bccd
    CANN Version: XX.XX, TimeStamp: XXXXXX
    DumpTensor: desc=5, addr=0, data_type=float16, position=UB, dump_size=32
    [19.000000, 4.000000, 38.000000, 50.000000, 39.000000, 67.000000, 84.000000, 98.000000, 21.000000, 36.000000, 18.000000, 46.000000, 10.000000, 92.000000, 26.000000, 38.000000, 39.000000, 9.000000, 82.000000, 37.000000, 35.000000, 65.000000, 97.000000, 59.000000, 89.000000, 63.000000, 70.000000, 57.000000, 35.000000, 3.000000, 16.000000,
    42.000000]
    DumpTensor: desc=5, addr=100, data_type=float16, position=UB, dump_size=32
    [6.000000, 34.000000, 52.000000, 38.000000, 73.000000, 38.000000, 35.000000, 14.000000, 67.000000, 62.000000, 30.000000, 49.000000, 86.000000, 37.000000, 84.000000, 18.000000, 38.000000, 18.000000, 44.000000, 21.000000, 86.000000, 99.000000, 13.000000, 79.000000, 84.000000, 9.000000, 48.000000, 74.000000, 52.000000, 99.000000, 80.000000,
    53.000000]
    ...
    DumpTensor: desc=5, addr=0, data_type=float16, position=UB, dump_size=32
    [35.000000, 41.000000, 41.000000, 22.000000, 84.000000, 49.000000, 60.000000, 0.000000, 90.000000, 14.000000, 67.000000, 80.000000, 16.000000, 46.000000, 16.000000, 83.000000, 6.000000, 70.000000, 97.000000, 28.000000, 97.000000, 62.000000, 80.000000, 22.000000, 53.000000, 37.000000, 23.000000, 58.000000, 65.000000, 28.000000, 4.000000,
    29.000000]
    

    This API uses the dump function. The total size of dump data (including the header) of all APIs that use the dump function for an operator on each core cannot exceed 1 MB. You need to control the amount of data to be printed. If the limit is exceeded, no content will be printed.

Examples

  • Printing without tensor shape
    1
    AscendC::DumpTensor(srcLocal, 5, dataLen);
    
  • Printing with tensor shape
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    uint32_t array[] = {static_cast<uint32_t>(8), static_cast<uint32_t>(8)};
    AscendC::ShapeInfo shapeInfo(2, array);       // Set dim to 2 and shape to (8,8).
    AscendC::DumpTensor(x, 2, 64, shapeInfo);     // Dump 64 elements of x, which are parsed and arranged based on (8,8) of shapeInfo.
    
    uint32_t array1[] = {static_cast<uint32_t>(7), static_cast<uint32_t>(8)};
    AscendC::ShapeInfo shapeInfo1(2, array1); // dim is 2, and shape is (7, 8).
    AscendC::DumpTensor(x1, 3, 64, shapeInfo1); // When the shape size is less than or equal to the number of elements in dumpSize, the elements are printed according to the shapeInfo. The extra dump data is not displayed.
    
    uint32_t array2[] = {static_cast<uint32_t>(9), static_cast<uint32_t>(8)};
    AscendC::ShapeInfo shapeInfo2(2, array2); // dim is 2, and shape is (9, 8).
    AscendC::DumpTensor(x2, 4, 64, shapeInfo2); // When the shape size is greater than the number of elements in dumpSize, the elements are printed according to the shapeInfo. The missing dump data is displayed as "-".
    

    Information similar to the following is displayed:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    DumpTensor: desc=2, addr=xxxx, data_type=float16, position=UB, dump_size=64
    [[150.000000,83.000000,109.000000,166.000000,129.000000,50.000000,150.000000,74.000000],
    [135.000000,79.000000,98.000000,134.000000,146.000000,166.000000,112.000000,70.000000],
    [122.000000,51.000000,116.000000,68.000000,172.000000,72.000000,102.000000,69.000000],
    [136.000000,83.000000,88.000000,88.000000,112.000000,148.000000,79.000000,136.000000],
    [133.000000,104.000000,83.000000,71.000000,83.000000,99.000000,103.000000,151.000000],
    [98.000000,118.000000,128.000000,83.000000,25.000000,105.000000,179.000000,34.000000],
    [104.000000,169.000000,115.000000,113.000000,134.000000,121.000000,88.000000,96.000000],
    [29.000000,139.000000,70.000000,40.000000,158.000000,138.000000,72.000000,171.000000]]
    DumpTensor: desc=3, addr=xxxx, data_type=float16, position=UB, dump_size=64
    shape is [7, 8], dumpSize is 64, dumpSize is greater than shapeSize.
    [[82.250000,37.312500,22.843750,91.937500,93.312500,77.125000,50.718750,27.171875],
    [21.859375,32.906250,20.109375,70.875000,13.398438,14.562500,30.156250,52.562500],
    [40.156250,45.781250,78.937500,65.687500,71.562500,61.375000,32.062500,80.750000],
    [55.593750,44.031250,43.781250,3.132812,38.750000,50.968750,79.562500,80.562500],
    [51.562500,22.468750,88.250000,20.578125,95.437500,83.562500,76.812500,34.281250],
    [75.500000,47.875000,52.562500,74.937500,39.687500,90.062500,28.890625,10.593750],
    [42.343750,67.062500,35.468750,60.875000,71.812500,81.562500,57.531250,62.500000]]
    DumpTensor: desc=4, addr=xxxx, data_type=float16, position=UB, dump_size=64
    shape is [9, 8], dumpSize is 64, data is not enough.
    [[95.437500,59.250000,57.281250,27.093750,41.375000,48.375000,33.093750,91.312500],
    [27.703125,60.718750,68.187500,70.875000,67.437500,84.562500,13.507812,4.550781],
    [24.500000,73.437500,36.062500,68.437500,55.500000,95.375000,60.250000,64.750000],
    [40.093750,85.000000,42.250000,39.531250,60.968750,8.953125,48.531250,53.906250],
    [53.656250,64.187500,84.750000,22.250000,95.500000,39.937500,12.945312,54.031250],
    [3.804688,98.187500,43.968750,26.000000,41.750000,34.500000,75.750000,89.625000],
    [25.046875,5.265625,65.500000,45.468750,32.937500,8.593750,1.705078,12.742188],
    [37.281250,95.125000,71.562500,27.515625,47.250000,36.312500,66.750000,31.250000],
    [-,-,-,-,-,-,-,-]]