vmla
功能说明
- 当src0、src1和dst类型相同时(均为half或float),以block(32Byte)为单位完成以下计算,一次完成8个block的计算。
[dst] = [src0] * [src1] + [dst]
- 当src0和src1为half类型,dst为f32类型时,src0和src1各取低4个block,合计64个half数。高4个block被忽略,相乘后与dst的64个f32(8个block)相加。
[dst(64个f32)] = [src0(低4个block的64个half)] * [src1(低4个block的64个half)] + [dst(64个f32)]
函数原型
void vmla(__ubuf__ half *dst, __ubuf__ half *src0, __ubuf__ half *src1, uint8_t repeat, uint8_t dstBlockStride, uint8_t src0BlockStride, uint8_t src1BlockStride, uint8_t dstRepeatStride, uint8_t src0RepeatStride, uint8_t src1RepeatStride); void vmla(__ubuf__ float *dst, __ubuf__ float *src0, __ubuf__ float *src1, uint8_t repeat, uint8_t dstBlockStride, uint8_t src0BlockStride, uint8_t src1BlockStride, uint8_t dstRepeatStride, uint8_t src0RepeatStride, uint8_t src1RepeatStride); void vmla(__ubuf__ float *dst, __ubuf__ half *src0, __ubuf__ half *src1, uint8_t repeat, uint8_t dstBlockStride, uint8_t src0BlockStride, uint8_t src1BlockStride, uint8_t dstRepeatStride, uint8_t src0RepeatStride, uint8_t src1RepeatStride);
流水类型
PIPE_V
父主题: 双目运算