DumpTensor
产品支持情况
产品 |
是否支持 |
---|---|
|
√ |
|
√ |
|
x |
|
√ |
|
x |
|
x |
|
x |
功能说明
基于算子工程开发的算子,可以使用该接口Dump指定Tensor的内容。同时支持打印自定义的附加信息(仅支持uint32_t数据类型的信息),比如打印当前行号等。
1
|
AscendC::DumpTensor(srcLocal, 5, dataLen); |

- 自定义算子工程
- Kernel直调工程
Dump时,每个block核的dump信息前会增加对应信息头DumpHead,用于记录核号和资源使用信息;每次Dump的Tensor数据前也会添加信息头DumpTensorHead,用于记录Tensor的相关信息。如下图所示,展示了多核打印场景下的打印信息结构。
DumpHead的具体信息如下:
- opType:当前运行的算子类型;
- CoreType:当前运行的核的类型;
- block dim:开发者设置的算子执行核数;
- total_block_num:参与dump的核数;
- block_remain_len:当前核剩余可用的dump的空间;
- block_initial_space:当前核初始分配的dump空间;
- rsv:保留字段;
- magic:内存校验魔术字。
DumpHead打印时,除了上述打印还会自动打印当前所运行核的类型及对应的该类型下的核索引,如:AIV-0。
DumpTensorHead的具体信息如下:
- desc:用户自定义附加信息;
- addr:Tensor的地址;
- data_type:Tensor的数据类型;
- position:表示Tensor所在的物理存储位置,当前仅支持Unified Buffer/L1 Buffer/L0C Buffer/Global Memory;
- dump_size:表示用户需要dump的元素个数。
DumpTensor打印结果的最前面会自动打印CANN_VERSION_STR值与CANN_TIMESTAMP值。其中,CANN_VERSION_STR与CANN_TIMESTAMP为宏定义,CANN_VERSION_STR代表CANN软件包的版本号信息,形式为字符串,CANN_TIMESTAMP为CANN软件包发布时的时间戳,形式为数值(uint64_t)。开发者也可在代码中直接使用这两个宏。
打印示例如下:
1 2 3 4 5 6 7 8 9 10 11 12 |
opType=AddCustom, DumpHead: AIV-0, CoreType=AIV, block dim=8, total_block_num=8, block_remain_len=1046912, block_initial_space=1048576, rsv=0, magic=5aa5bccd CANN Version: XX.XX, TimeStamp: XXXXXXXXXXXXXXXXX DumpTensor: desc=5, addr=0, data_type=float16, position=UB, dump_size=32 [19.000000, 4.000000, 38.000000, 50.000000, 39.000000, 67.000000, 84.000000, 98.000000, 21.000000, 36.000000, 18.000000, 46.000000, 10.000000, 92.000000, 26.000000, 38.000000, 39.000000, 9.000000, 82.000000, 37.000000, 35.000000, 65.000000, 97.000000, 59.000000, 89.000000, 63.000000, 70.000000, 57.000000, 35.000000, 3.000000, 16.000000, 42.000000] DumpTensor: desc=5, addr=100, data_type=float16, position=UB, dump_size=32 [6.000000, 34.000000, 52.000000, 38.000000, 73.000000, 38.000000, 35.000000, 14.000000, 67.000000, 62.000000, 30.000000, 49.000000, 86.000000, 37.000000, 84.000000, 18.000000, 38.000000, 18.000000, 44.000000, 21.000000, 86.000000, 99.000000, 13.000000, 79.000000, 84.000000, 9.000000, 48.000000, 74.000000, 52.000000, 99.000000, 80.000000, 53.000000] ... DumpTensor: desc=5, addr=0, data_type=float16, position=UB, dump_size=32 [35.000000, 41.000000, 41.000000, 22.000000, 84.000000, 49.000000, 60.000000, 0.000000, 90.000000, 14.000000, 67.000000, 80.000000, 16.000000, 46.000000, 16.000000, 83.000000, 6.000000, 70.000000, 97.000000, 28.000000, 97.000000, 62.000000, 80.000000, 22.000000, 53.000000, 37.000000, 23.000000, 58.000000, 65.000000, 28.000000, 4.000000, 29.000000] |
函数原型
- 无Tensor shape的打印
1 2 3 4
template <typename T> __aicore__ inline void DumpTensor(const LocalTensor<T> &tensor, uint32_t desc, uint32_t dumpSize) template <typename T> __aicore__ inline void DumpTensor(const GlobalTensor<T>& tensor, uint32_t desc, uint32_t dumpSize)
- 带Tensor shape的打印
1 2 3 4
template <typename T> __aicore__ inline void DumpTensor(const LocalTensor<T>& tensor, uint32_t desc, uint32_t dumpSize, const ShapeInfo& shapeInfo) template <typename T> __aicore__ inline void DumpTensor(const GlobalTensor<T>& tensor, uint32_t desc, uint32_t dumpSize, const ShapeInfo& shapeInfo)
参数说明
参数名 |
描述 |
---|---|
T |
需要dump的Tensor的数据类型。 |
参数名 |
输入/输出 |
描述 |
---|---|---|
tensor |
输入 |
需要dump的Tensor。
|
desc |
输入 |
用户自定义附加信息(行号或其他自定义数字)。 在使用DumpTensor功能时,用户可通过desc参数附加自定义信息,以便在不同调用场景下区分Dump内容的来源。此功能有助于精准定位具体DumpTensor的输出,提升调试与分析效率。 |
dumpSize |
输入 |
需要dump的元素个数。 |
shapeInfo |
输入 |
传入Tensor的shape信息,可按照shape信息进行打印。
|
返回值说明
无
约束说明
- 该功能仅用于NPU上板调试,且仅在如下场景支持:
- 当前仅支持打印存储位置为Unified Buffer/L1 Buffer/L0C Buffer/Global Memory的Tensor信息。
- 操作数地址偏移对齐要求请参见通用约束。
- 该接口使用Dump功能,所有使用Dump功能的接口在每个核上Dump的数据总量(包括信息头)不可超过1M。请开发者自行控制待打印的内容数据量,超出则不会打印。
在计算数据量时,若Dump的总长度未对齐,需要考虑padding数据的影响。当进行非对齐Dump时,如果实际Dump的元素长度不满足32字节对齐,系统会在其末尾自动补充一定数量的padding数据,以满足对齐要求。例如,Tensor1中用户需要Dump的元素长度为30字节,系统会在其后添加2字节的padding,使总长度对齐到32字节。但在实际解析时,仍只解析原始的30 字节数据,padding部分不会被使用。
调用示例
- 无Tensor shape的打印
1
AscendC::DumpTensor(srcLocal, 5, dataLen);
- 带Tensor shape的打印
1 2 3 4 5 6 7 8 9 10 11
uint32_t array[] = {static_cast<uint32_t>(8), static_cast<uint32_t>(8)}; AscendC::ShapeInfo shapeInfo(2, array); // dim为2, shape为(8,8) AscendC::DumpTensor(x, 2, 64, shapeInfo); // dump x的64个元素,且解析按照shapeInfo的(8,8)排列 uint32_t array1[] = {static_cast<uint32_t>(7), static_cast<uint32_t>(8)}; AscendC::ShapeInfo shapeInfo1(2, array1); // dim为2, shape为(7,8) AscendC::DumpTensor(x1, 3, 64, shapeInfo1); // 当Shape尺寸小于等于dumpSize元素个数时, 按照ShapeInfo打印元素,多出的Dump数据不展示 uint32_t array2[] = {static_cast<uint32_t>(9), static_cast<uint32_t>(8)}; AscendC::ShapeInfo shapeInfo2(2, array2); // dim为2, shape为(9,8) AscendC::DumpTensor(x2, 4, 64, shapeInfo2); // 当Shape尺寸大于dumpSize元素个数时, 按照ShapeInfo打印元素,不足的Dump数据用"-"展示
打印结果如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
opType=add_custom, DumpHead: AIV-0, CoreType=AIV, block dim=1, total_block_num=3, block_remain_len=1047648, block_initial_space=1048576, rsv=0, magic=5aa5bccd CANN Version: XX.XX, TimeStamp: XXXXXXXXXXXXXXXXX DumpTensor: desc=2, addr=xxxx, data_type=float16, position=UB, dump_size=64 [[150.000000,83.000000,109.000000,166.000000,129.000000,50.000000,150.000000,74.000000], [135.000000,79.000000,98.000000,134.000000,146.000000,166.000000,112.000000,70.000000], [122.000000,51.000000,116.000000,68.000000,172.000000,72.000000,102.000000,69.000000], [136.000000,83.000000,88.000000,88.000000,112.000000,148.000000,79.000000,136.000000], [133.000000,104.000000,83.000000,71.000000,83.000000,99.000000,103.000000,151.000000], [98.000000,118.000000,128.000000,83.000000,25.000000,105.000000,179.000000,34.000000], [104.000000,169.000000,115.000000,113.000000,134.000000,121.000000,88.000000,96.000000], [29.000000,139.000000,70.000000,40.000000,158.000000,138.000000,72.000000,171.000000]] DumpTensor: desc=3, addr=xxxx, data_type=float16, position=UB, dump_size=64 shape is [7, 8], dumpSize is 64, dumpSize is greater than shapeSize. [[82.250000,37.312500,22.843750,91.937500,93.312500,77.125000,50.718750,27.171875], [21.859375,32.906250,20.109375,70.875000,13.398438,14.562500,30.156250,52.562500], [40.156250,45.781250,78.937500,65.687500,71.562500,61.375000,32.062500,80.750000], [55.593750,44.031250,43.781250,3.132812,38.750000,50.968750,79.562500,80.562500], [51.562500,22.468750,88.250000,20.578125,95.437500,83.562500,76.812500,34.281250], [75.500000,47.875000,52.562500,74.937500,39.687500,90.062500,28.890625,10.593750], [42.343750,67.062500,35.468750,60.875000,71.812500,81.562500,57.531250,62.500000]] DumpTensor: desc=4, addr=xxxx, data_type=float16, position=UB, dump_size=64 shape is [9, 8], dumpSize is 64, data is not enough. [[95.437500,59.250000,57.281250,27.093750,41.375000,48.375000,33.093750,91.312500], [27.703125,60.718750,68.187500,70.875000,67.437500,84.562500,13.507812,4.550781], [24.500000,73.437500,36.062500,68.437500,55.500000,95.375000,60.250000,64.750000], [40.093750,85.000000,42.250000,39.531250,60.968750,8.953125,48.531250,53.906250], [53.656250,64.187500,84.750000,22.250000,95.500000,39.937500,12.945312,54.031250], [3.804688,98.187500,43.968750,26.000000,41.750000,34.500000,75.750000,89.625000], [25.046875,5.265625,65.500000,45.468750,32.937500,8.593750,1.705078,12.742188], [37.281250,95.125000,71.562500,27.515625,47.250000,36.312500,66.750000,31.250000], [-,-,-,-,-,-,-,-]]