vec_axpy

Description

Performs multiplication-accumulation between a vector and a scalar element-wise.

Prototype

vec_axpy(mask, dst, src, scalar, repeat_times, dst_rep_stride, src_rep_stride)

Parameters

For details, see Parameters. src and scalar must have the same data type.

The supported precision combinations are as follows.

Table 1 Precision combinations supported by vec_axpy

Type

Float16 Precision Combination

Float32 Precision Combination

Fixed Precision Combination

Atlas 200/300/500 Inference Product support

Y

Y

Y

Atlas Training Series Product support

Y

Y

Y

The meanings of the precision combinations in the preceding table are as follows:

  • Float16 precision combination: src.dtype=float16; scalar.dtype=float16; dst.dtype=float16; Parallelism PAR/repeat=128
  • Float32 precision combination: src.dtype=float32; scalar.dtype=float32; dst.dtype=float32; Parallelism PAR/repeat=64
  • Fixed precision combination: src.dtype=float16; scalar.dtype=float16; dst.dtype=float32; Parallelism PAR/repeat=64

Returns

None

Restrictions

  • For details, see Restrictions.
  • Note that mixed precision (fmix) is supported.
  • In fmix mode, only the first four blocks of src are computed every iteration.

Example

from tbe import tik
tik_instance = tik.Tik()
src_gm = tik_instance.Tensor("float16", (128,), name="src_gm", scope=tik.scope_gm)
dst_gm = tik_instance.Tensor("float32", (64,), name="dst_gm", scope=tik.scope_gm)
src_ub = tik_instance.Tensor("float16", (128,), name="src_ub", scope=tik.scope_ubuf)
dst_ub = tik_instance.Tensor("float32", (64,), name="dst_ub", scope=tik.scope_ubuf)
# Move the user input from the Global Memory to the Unified Buffer.
tik_instance.data_move(src_ub, src_gm, 0, 1, 8, 0, 0)
# Assign 10 to the destination Unified Buffer as its initial value.
tik_instance.vec_dup(64, dst_ub, 10, 1, 8)
tik_instance.vec_axpy(64, dst_ub, src_ub, 2.0, 1, 8, 4)
# Move the compute result from the Unified Buffer to the Global Memory.
tik_instance.data_move(dst_gm, dst_ub, 0, 1, 8, 0, 0)

tik_instance.BuildCCE(kernel_name="vec_axpy", inputs=[src_gm], outputs=[dst_gm])

Result example:

Input (src_gm):
[ 4.8      5.363   -8.375   -0.01346 -9.195   -6.03    -4.336   -8.73
  4.715   -0.805    2.168    6.094   -5.414   -8.16     8.86    -2.135
  0.2925  -4.21     4.18     8.94    -7.797   -4.54    -2.082   -1.0625
  6.38     7.918   -7.52    -2.055    5.86    -4.562    0.9116   8.09
  2.11    -1.92     9.01    -5.24     3.81     2.264    9.04    -9.305
  2.771   -4.445    1.704   -5.65     1.71    -8.      -3.076   -8.86
  5.258   -3.928    1.929    5.273   -9.734    1.14    -8.71    -3.385
 -9.85    -3.643    4.188   -4.406   -6.008    1.957   -4.496    1.547
  5.207   -7.957    2.145    8.36    -9.375   -0.1924   9.54     8.16
  0.8003   8.34    -2.846    3.871   -8.8     -9.95    -8.414    5.504
  9.414    1.483   -6.547    6.84     0.5835  -0.1847  -2.719   -4.773
 -4.56     0.816   -1.507   -7.633   -3.885   -0.1384  -8.945    9.78
 -5.13    -3.174   -1.487   -6.984    4.76    -5.758    9.34    -4.35
 -4.05    -7.36    -1.642   -7.832   -4.977   -5.395   -8.23     6.438
  1.207   -8.484    2.71    -9.664    3.709   -7.832   -4.965   -7.49
 -4.887    7.09    -7.887   -4.785   -3.54     9.44     5.32     7.914  ]

Output (dst_gm):
[19.601562  20.726562  -6.75       9.9730835 -8.390625  -2.0625
  1.328125  -7.453125  19.429688   8.389648  14.3359375 22.1875
 -0.828125  -6.3125    27.71875    5.7304688 10.584961   1.578125
 18.359375  27.875     -5.59375    0.921875   5.8359375  7.875
 22.757812  25.835938  -5.0390625  5.890625  21.71875    0.875
 11.823242  26.1875    14.21875    6.1601562 28.015625  -0.4765625
 17.621094  14.527344  28.078125  -8.609375  15.542969   1.109375
 13.408203  -1.296875  13.419922  -6.         3.8476562 -7.71875
 20.515625   2.1445312 13.857422  20.546875  -9.46875   12.279297
 -7.421875   3.2304688 -9.703125   2.7148438 18.375      1.1875
 -2.015625  13.9140625  1.0078125 13.09375  ]