vec_axpy

Description

Performs multiplication-accumulation between a vector and a scalar element-wise.

Prototype

vec_axpy(mask, dst, src, scalar, repeat_times, dst_rep_stride, src_rep_stride)

Parameters

For details, see Parameters. src and scalar must have the same data type.

The supported precision combinations are as follows.

**Table 1** Precision combinations supported by vec_axpy
Type	Float16 Precision Combination	Float32 Precision Combination	Fixed Precision Combination
Atlas 200/300/500 Inference Product support	Y	Y	Y
Atlas Training Series Product support	Y	Y	Y

The meanings of the precision combinations in the preceding table are as follows:

Float16 precision combination: src.dtype=float16; scalar.dtype=float16; dst.dtype=float16; Parallelism PAR/repeat=128
Float32 precision combination: src.dtype=float32; scalar.dtype=float32; dst.dtype=float32; Parallelism PAR/repeat=64
Fixed precision combination: src.dtype=float16; scalar.dtype=float16; dst.dtype=float32; Parallelism PAR/repeat=64

Returns

None

Restrictions

For details, see Restrictions.
Note that mixed precision (fmix) is supported.
In fmix mode, only the first four blocks of src are computed every iteration.

Example

from tbe import tik
tik_instance = tik.Tik()
src_gm = tik_instance.Tensor("float16", (128,), name="src_gm", scope=tik.scope_gm)
dst_gm = tik_instance.Tensor("float32", (64,), name="dst_gm", scope=tik.scope_gm)
src_ub = tik_instance.Tensor("float16", (128,), name="src_ub", scope=tik.scope_ubuf)
dst_ub = tik_instance.Tensor("float32", (64,), name="dst_ub", scope=tik.scope_ubuf)
# Move the user input from the Global Memory to the Unified Buffer.
tik_instance.data_move(src_ub, src_gm, 0, 1, 8, 0, 0)
# Assign 10 to the destination Unified Buffer as its initial value.
tik_instance.vec_dup(64, dst_ub, 10, 1, 8)
tik_instance.vec_axpy(64, dst_ub, src_ub, 2.0, 1, 8, 4)
# Move the compute result from the Unified Buffer to the Global Memory.
tik_instance.data_move(dst_gm, dst_ub, 0, 1, 8, 0, 0)

tik_instance.BuildCCE(kernel_name="vec_axpy", inputs=[src_gm], outputs=[dst_gm])

Result example:

Input (src_gm):
[ 4.8      5.363   -8.375   -0.01346 -9.195   -6.03    -4.336   -8.73
  4.715   -0.805    2.168    6.094   -5.414   -8.16     8.86    -2.135
  0.2925  -4.21     4.18     8.94    -7.797   -4.54    -2.082   -1.0625
  6.38     7.918   -7.52    -2.055    5.86    -4.562    0.9116   8.09
  2.11    -1.92     9.01    -5.24     3.81     2.264    9.04    -9.305
  2.771   -4.445    1.704   -5.65     1.71    -8.      -3.076   -8.86
  5.258   -3.928    1.929    5.273   -9.734    1.14    -8.71    -3.385
 -9.85    -3.643    4.188   -4.406   -6.008    1.957   -4.496    1.547
  5.207   -7.957    2.145    8.36    -9.375   -0.1924   9.54     8.16
  0.8003   8.34    -2.846    3.871   -8.8     -9.95    -8.414    5.504
  9.414    1.483   -6.547    6.84     0.5835  -0.1847  -2.719   -4.773
 -4.56     0.816   -1.507   -7.633   -3.885   -0.1384  -8.945    9.78
 -5.13    -3.174   -1.487   -6.984    4.76    -5.758    9.34    -4.35
 -4.05    -7.36    -1.642   -7.832   -4.977   -5.395   -8.23     6.438
  1.207   -8.484    2.71    -9.664    3.709   -7.832   -4.965   -7.49
 -4.887    7.09    -7.887   -4.785   -3.54     9.44     5.32     7.914  ]

Output (dst_gm):
[19.601562  20.726562  -6.75       9.9730835 -8.390625  -2.0625
  1.328125  -7.453125  19.429688   8.389648  14.3359375 22.1875
 -0.828125  -6.3125    27.71875    5.7304688 10.584961   1.578125
 18.359375  27.875     -5.59375    0.921875   5.8359375  7.875
 22.757812  25.835938  -5.0390625  5.890625  21.71875    0.875
 11.823242  26.1875    14.21875    6.1601562 28.015625  -0.4765625
 17.621094  14.527344  28.078125  -8.609375  15.542969   1.109375
 13.408203  -1.296875  13.419922  -6.         3.8476562 -7.71875
 20.515625   2.1445312 13.857422  20.546875  -9.46875   12.279297
 -7.421875   3.2304688 -9.703125   2.7148438 18.375      1.1875
 -2.015625  13.9140625  1.0078125 13.09375  ]

Parent topic: Triple Sources with One Scalar (Gather Mode)