NPU域上板数据打印功能包括DumpTensor、printf两种,其中DumpTensor用于打印指定Tensor的数据,printf主要用于打印标量和字符串信息。
该功能仅在如下场景支持:
具体的使用方法如下:
1
|
DumpTensor(srcLocal,5, dataLen); |
Dump时,每个block核的dump信息前会增加对应信息头DumpHead(32字节大小),用于记录核号和资源使用信息;每次Dump的Tensor数据前也会添加信息头DumpTensorHead(32字节大小),用于记录Tensor的相关信息。打印结果的样例如下:
1 2 3 4 5 6 7 8 9 10 11 12 |
opType=AddCustom, DumpHead: AIV-0, CoreType=AIV, block dim=8, total_block_num=8, block_remain_len=1046912, block_initial_space=1048576, rsv=0, magic=5aa5bccd CANN Version: XX.XX, TimeStamp: XXXXXXXXXXXXXXXXX DumpTensor: desc=5, addr=0, data_type=float16, position=UB [19.000000, 4.000000, 38.000000, 50.000000, 39.000000, 67.000000, 84.000000, 98.000000, 21.000000, 36.000000, 18.000000, 46.000000, 10.000000, 92.000000, 26.000000, 38.000000, 39.000000, 9.000000, 82.000000, 37.000000, 35.000000, 65.000000, 97.000000, 59.000000, 89.000000, 63.000000, 70.000000, 57.000000, 35.000000, 3.000000, 16.000000, 42.000000] DumpTensor: desc=5, addr=100, data_type=float16, position=UB [6.000000, 34.000000, 52.000000, 38.000000, 73.000000, 38.000000, 35.000000, 14.000000, 67.000000, 62.000000, 30.000000, 49.000000, 86.000000, 37.000000, 84.000000, 18.000000, 38.000000, 18.000000, 44.000000, 21.000000, 86.000000, 99.000000, 13.000000, 79.000000, 84.000000, 9.000000, 48.000000, 74.000000, 52.000000, 99.000000, 80.000000, 53.000000] ... DumpTensor: desc=5, addr=0, data_type=float16, position=UB [35.000000, 41.000000, 41.000000, 22.000000, 84.000000, 49.000000, 60.000000, 0.000000, 90.000000, 14.000000, 67.000000, 80.000000, 16.000000, 46.000000, 16.000000, 83.000000, 6.000000, 70.000000, 97.000000, 28.000000, 97.000000, 62.000000, 80.000000, 22.000000, 53.000000, 37.000000, 23.000000, 58.000000, 65.000000, 28.000000, 4.000000, 29.000000] |
1
|
printf("fmt string %d", 0x123); |
基于NPU域算子的调用接口编写的算子程序,通过毕昇编译器编译后生成可执行程序,运行可执行程序,可以完成算子NPU域的运行验证。使用算子调优工具运行NPU模式下生成的可执行文件从而采集Ascend C算子在AI处理器上执行的性能数据,进行性能精细调优。性能调优工具的具体使用方法请参考算子调优(msProf)。