vmla

Function

vmla instruction abstraction.

z = x × y + z, where x and y are multiplied by element, and the multiplication result is added to z by element. The output data type can be specified as FP32 by using if_mix.

The following types are supported:

type = f16, f16 = f16 × f16 + f16

type = f32, f32 = f32 × f32 + f32

type = fmix, f32 = f16 × f16 + f32, where the Xn and Xm vectors use 64-element f16 data for calculation. The source vector uses only the lower four blocks, and the upper four blocks are ignored. Xd is 64-element f32 data with eight blocks, and is used as both the target vector and the third source vector.

Prototype

class vmla(x, y, z, if_mix=False)

Parameters

Parameter	Input/Output	Data Type	Description
x	Input	Tensor variable	Input x-vector tensor. FP16 and FP32 are supported.
y	Input	Tensor variable	Input y-vector tensor. FP16 and FP32 are supported.
z	Output	Tensor variable	Output vector tensor. FP16 and FP32 are supported.
if_mix	Input	Tensor variable	The default value is False. If this parameter is set to True, the output data type is FP32.

Constraints

The tensors of input and output data of vector instructions are in the UB space.

Example

from mskpp import vmla, Tensor
ub_x, ub_y, ub_z = Tensor("UB"), Tensor("UB"), Tensor("UB")
gm_x, gm_y = Tensor("GM"), Tensor("GM")
ub_x.load(gm_x)
ub_y.load(gm_y)
out = vmla(ub_x, ub_y, ub_z)()

Parent topic: Description of msKPP External APIs