Add算子可执行文件的构建命令示例如下:
bash run.sh -r npu -v <soc_version>
一键式编译运行脚本完成后,在工程目录下生成NPU侧可执行文件<kernel_name>_npu。
mssanitizer --tool=memcheck ./add_npu # 内存检测需指定 --tool=memcheck
mssanitizer --tool=racecheck ./add_npu # 竞争检测需指定 --tool=racecheck
单算子可执行文件所在路径可配置为绝对路径或相对路径,请根据实际环境配置。
__aicore__ inline void CopyOut(int32_t progress) { // deque output tensor from VECOUT queue LocalTensor<half> zLocal = outQueueZ.DeQue<half>(); // copy progress_th tile from local tensor to global tensor // 构建非法读写场景 DataCopy(zGm[progress * TILE_LENGTH], zLocal, 2 * TILE_LENGTH); // free output tensor for reuse outQueueZ.FreeTensor(zLocal); }
$ mssanitizer --tool=memcheck --leak-check=yes ./add_custom_npu [mssanitizer] logging to file: ./mssanitizer_20240124182331_37743.log ====== ERROR: illegal read of size 256 ====== at 0x124080022f00 on GM ====== in block 7 ====== code in add_custom.cpp:63
__aicore__ inline void Compute(int32_t progress) { LocalTensor<half> xLocal = inQueueX.DeQue<half>(); LocalTensor<half> yLocal = inQueueY.DeQue<half>(); LocalTensor<half> zLocal = outQueueZ.AllocTensor<half>(); Add(zLocal, xLocal, yLocal, TILE_LENGTH); // outQueueZ.EnQue<half>(zLocal); inQueueX.FreeTensor(xLocal); inQueueY.FreeTensor(yLocal); // 计算完成后不通过 queue 保证同步,构造竞争异常场景 DataCopy(zGm[progress * TILE_LENGTH], zLocal, TILE_LENGTH); outQueueZ.FreeTensor(zLocal); } ... __aicore__ inline void Process() { int32_t loopCount = TILE_NUM * BUFFER_NUM; for (int32_t i = 0; i < loopCount; i++) { CopyIn(i); Compute(i); // 不使用 CopyOut 函数,直接在 Compute 函数中做搬出 // CopyOut(i); } }
$ mssanitizer --tool=racecheck ./add_npu ====== ERROR: Potential RAW hazard detected at UB : ====== PIPE_V Write at RAW()+0x400 in add_custom.cpp:58 ====== PIPE_MTE3 Read at RAW()+0x400 in add_custom.cpp:63