vmla
Function
vmla instruction abstraction.
z = x × y + z, where x and y are multiplied by element, and the multiplication result is added to z by element. The output data type can be specified as FP32 by using if_mix.
The following types are supported:
type = f16, f16 = f16 × f16 + f16
type = f32, f32 = f32 × f32 + f32
type = fmix, f32 = f16 × f16 + f32, where the Xn and Xm vectors use 64-element f16 data for calculation. The source vector uses only the lower four blocks, and the upper four blocks are ignored. Xd is 64-element f32 data with eight blocks, and is used as both the target vector and the third source vector.
Prototype
1 | class vmla(x, y, z, if_mix=False) |
Parameters
Parameter |
Input/Output |
Data Type |
Description |
|---|---|---|---|
x |
Input |
Tensor variable |
Input x-vector tensor. FP16 and FP32 are supported. |
y |
Input |
Tensor variable |
Input y-vector tensor. FP16 and FP32 are supported. |
z |
Output |
Tensor variable |
Output vector tensor. FP16 and FP32 are supported. |
if_mix |
Input |
Tensor variable |
|
Constraints
The tensors of input and output data of vector instructions are in the UB space.
Example
1 2 3 4 5 6 | from mskpp import vmla, Tensor ub_x, ub_y, ub_z = Tensor("UB"), Tensor("UB"), Tensor("UB") gm_x, gm_y = Tensor("GM"), Tensor("GM") ub_x.load(gm_x) ub_y.load(gm_y) out = vmla(ub_x, ub_y, ub_z)() |