DumpAccChkPoint
Supported Products
Product |
Supported/Unsupported |
|---|---|
√ |
|
√ |
|
x |
|
√ |
|
x |
|
x |
Functions
Dumps the content of specified tensors for operators developed based on operator projects and supports the printing of user-defined additional information (limited to the uint32_t data type), for example, the current line number. Unlike DumpTensor, this API can be used to print tensors with specified offset.
1 | AscendC::DumpAccChkPoint(srcLocal, 5, 32, dataLen); |
The printing function of DumpAccChkPoint affects the actual running performance of the operator. Therefore, this function is usually used in the debugging phase. You can disable the printing function by setting ASCENDC_DUMP to 0.
Prototype
1 2 3 4 | template <typename T> __aicore__ inline void DumpAccChkPoint(const LocalTensor<T> &tensor, uint32_t index, uint32_t countOff, uint32_t dumpSize) template <typename T> __aicore__ inline void DumpAccChkPoint(const GlobalTensor<T> &tensor, uint32_t index, uint32_t countOff, uint32_t dumpSize) |
Parameters
Parameter |
Description |
|---|---|
T |
Data type of the tensor to be dumped. |
Parameter |
Input/Output |
Description |
|---|---|---|
tensor |
Input |
Tensor to be dumped. If the tensor to be dumped is stored in Unified Buffer/L1 Buffer/L0C Buffer, use the tensor parameter input of the LocalTensor type. If the tensor to be dumped is stored in Global Memory, use the tensor parameter input of the GlobalTensor type. |
index |
Input |
User-defined additional information (line numbers or other user-defined numbers). |
countOff |
Input |
Number of offset elements. The address of the tensor after offset must meet the alignment requirements of the physical location. For details, see General Description and Restrictions. |
dumpSize |
Input |
Number of elements to be dumped. |
Returns
None
Constraints
- This function is used only for on-board debugging on the NPU.
- Currently, printing in the operator graph input scenario is not supported.
- Currently, only the information about tensors stored in the Unified Buffer/L1 Buffer/L0C Buffer/Global Memory can be printed.
- For details about the operand address alignment requirements, see General Address Alignment Restrictions.
- For a single call, the total size of data printed by DumpTensor cannot exceed 1 MB (including the header and trailer information required by the framework, which can be ignored). If the size exceeds the limit, the data will not be printed.
- During data volume calculation, if the total length of the dump data is not aligned, the impact of padding data needs to be considered. During unaligned dump, if the length of the dumped elements is not 32-byte aligned, the system automatically adds a certain amount of padding data to the end of the elements to meet the alignment requirement. For example, if the length of the element to be dumped in Tensor1 is 30 bytes, the system adds 2-byte padding to the end of the element to align the total length to 32 bytes. However, only the original 30-byte data is parsed, and the padding part is not used.
- When a custom operator project is used for operator development, the API output is different from the preceding description.
During dump, the corresponding information header DumpHead is added before the dump information of each block core to record the core ID and resource usage information. The information header DumpTensorHead is also added before the tensor data to be dumped each time to record tensor information. The information structure in the multi-core printing scenario is illustrated in the figure below.

The specific DumpHead information is as follows:
- opType: type of the running operator;
- CoreType: type of the running core;
- block dim: number of operator execution cores set by the developer;
- total_block_num: number of cores involved in dump;
- block_remain_len: available dump space in the current core.
- block_initial_space: initial dump space allocated in the current core.
- rsv: reserved field
- magic: magic number for memory verification.
During DumpHead printing, the type of the running core and the corresponding core index (for example, AIV-0) are automatically printed in addition to the preceding information.
The specific DumpTensorHead information is as follows:
- desc: user-defined additional information.
- addr: tensor address.
- data_type: tensor data type.
- position: physical storage position of the tensor, which can only be Unified Buffer/L1 Buffer/L0C Buffer/Global Memory.
- dump_size: number of elements to be dumped.
The values of CANN_VERSION_STR and CANN_TIMESTAMP are automatically printed at the beginning of the DumpAccChkPoint printing result. CANN_VERSION_STR and CANN_TIMESTAMP are macro definitions. CANN_VERSION_STR indicates the version number of the CANN package in the form of a string. CANN_TIMESTAMP indicates the timestamp when the CANN package is released, the value is in the format of uint64_t. You can directly use the two macros in the code.
An example of the printing result is as follows:
1 2 3 4 5 6 7
opType=AddCustom, DumpHead: AIV-0, CoreType=AIV, block dim=8, total_block_num=8, block_remain_len=1046912, block_initial_space=1048576, rsv=0, magic=5aa5bccd CANN Version: XX.XX,TimeStamp: XXXXXX DumpTensor: desc=5, addr=40, data_type=float16, position=UB, dump_size=32 [16.000000, 22.000000, 2.000000, 3.000000, 58.000000, 62.000000, 33.000000, 74.000000, 51.000000, 69.000000, 61.000000, 9.000000, 53.000000, 35.000000, 14.000000, 43.000000, 20.000000, 43.000000, 92.000000, 84.000000, 9.000000, 6.000000, 78.000000, 53.000000, 52.000000, 33.000000, 51.000000, 61.000000, 92.000000, 45.000000, 39.000000,34.000000] ... DumpTensor: desc=5, addr=140, data_type=float16, position=UB, dump_size=32 [41.000000, 91.000000, 12.000000, 32.000000, 28.000000, 49.000000, 2.000000, 75.000000, 11.000000, 32.000000, 17.000000, 31.000000, 70.000000, 38.000000, 76.000000, 87.000000, 61.000000, 8.000000, 55.000000, 70.000000, 17.000000, 37.000000, 35.000000, 58.000000, 94.000000, 31.000000, 50.000000, 29.000000, 13.000000, 37.000000, 79.000000,29.000000]
This API uses the dump function. The total size of dump data of all APIs that use the dump function for an operator on each core cannot exceed 1 MB. You need to control the amount of data to be printed. If the limit is exceeded, no content will be printed.
Examples
1 | AscendC::DumpAccChkPoint(srcLocal, 7, 32, 128); |