向量计算典型语义
一条vadd的intrinsic接口最多可以完成三层for循环的运算,最内层循环是element层面的(需要连续且32B对齐),中间层循环是block层面的(使用BlockStride可以设置间隔大小),最外层是重复的次数(使用RepeatStride设置间隔大小)。
/* intrinsic void vadd(__ubuf__ DataType *dst, __ubuf__ DataType *src0, __ubuf__ DataType *src1, uint8_t repeat, uint8_t dstBlockStride, uint8_t src0BlockStride, uint8_t src1BlockStride, uint8_t dstRepeatStride, uint8_t src0RepeatStride, uint8_t src1RepeatStride); */ /* 语义 int blkNum = 8; int eleNumInOneBlk = 32 / sizeof(DataType); for (int i = 0; i < repeat; i++) { for (int j = 0; j < blkNum; j++) { for (int e = 0; e < eleNumInOneBlk ; e++) { eltSrc0 = src0 + i * src0RepeatStride * eleNumInOneBlk + j * src0BlockStride * eleNumInOneBlk + e; // src element eltSrc1 = src1 + i * src1RepeatStride * eleNumInOneBlk + j * src1BlockStride * eleNumInOneBlk + e; eltDst = dst + i * dstRepeatStride * eleNumInOneBlk + j * dstBlockStride * eleNumInOneBlk + e; // dst element *eltDst = *eltSrc0 + *eltSrc1; } } } */
父主题: 典型语义