昇腾社区首页
中文
注册

vmla

功能说明

  • 当src0、src1和dst类型相同时(均为half或float),以block(32Byte)为单位完成以下计算,一次完成8个block的计算。
[dst] = [src0] * [src1] + [dst]
  • 当src0和src1为half类型,dst为f32类型时,src0和src1各取低4个block,合计64个half数。高4个block被忽略,相乘后与dst的64个f32(8个block)相加。
[dst(64个f32)] = [src0(低4个block的64个half)] * [src1(低4个block的64个half)] + [dst(64个f32)]

函数原型

void vmla(__ubuf__ half *dst, __ubuf__ half *src0, __ubuf__ half *src1, uint8_t repeat, uint8_t dstBlockStride, uint8_t src0BlockStride, uint8_t src1BlockStride, uint8_t dstRepeatStride, uint8_t src0RepeatStride, uint8_t src1RepeatStride); 
 
void vmla(__ubuf__ float *dst, __ubuf__ float *src0, __ubuf__ float *src1, uint8_t repeat, uint8_t dstBlockStride, uint8_t src0BlockStride, uint8_t src1BlockStride, uint8_t dstRepeatStride, uint8_t src0RepeatStride, uint8_t src1RepeatStride); 
 
void vmla(__ubuf__ float *dst, __ubuf__ half *src0, __ubuf__ half *src1, uint8_t repeat, uint8_t dstBlockStride, uint8_t src0BlockStride, uint8_t src1BlockStride, uint8_t dstRepeatStride, uint8_t src0RepeatStride, uint8_t src1RepeatStride);

流水类型

PIPE_V