vmla

Function

vmla instruction abstraction.

z = x × y + z, where x and y are multiplied by element, and the multiplication result is added to z by element. The output data type can be specified as FP32 by using if_mix.

The following types are supported:

type = f16, f16 = f16 × f16 + f16

type = f32, f32 = f32 × f32 + f32

type = fmix, f32 = f16 × f16 + f32, where the Xn and Xm vectors use 64-element f16 data for calculation. The source vector uses only the lower four blocks, and the upper four blocks are ignored. Xd is 64-element f32 data with eight blocks, and is used as both the target vector and the third source vector.

Prototype

1
class vmla(x, y, z, if_mix=False)

Parameters

Parameter

Input/Output

Data Type

Description

x

Input

Tensor variable

Input x-vector tensor. FP16 and FP32 are supported.

y

Input

Tensor variable

Input y-vector tensor. FP16 and FP32 are supported.

z

Output

Tensor variable

Output vector tensor. FP16 and FP32 are supported.

if_mix

Input

Tensor variable

  • The default value is False.
  • If this parameter is set to True, the output data type is FP32.

Constraints

The tensors of input and output data of vector instructions are in the UB space.

Example

1
2
3
4
5
6
from mskpp import vmla, Tensor
ub_x, ub_y, ub_z = Tensor("UB"), Tensor("UB"), Tensor("UB")
gm_x, gm_y = Tensor("GM"), Tensor("GM")
ub_x.load(gm_x)
ub_y.load(gm_y)
out = vmla(ub_x, ub_y, ub_z)()