vec_axpy
Description
Performs multiplication-accumulation between a vector and a scalar element-wise.

Prototype
vec_axpy(mask, dst, src, scalar, repeat_times, dst_rep_stride, src_rep_stride)
Parameters
For details, see Parameters. src and scalar must have the same data type.
The supported precision combinations are as follows.
|
Type |
Float16 Precision Combination |
Float32 Precision Combination |
Fixed Precision Combination |
|---|---|---|---|
|
|
Y |
Y |
Y |
|
|
Y |
Y |
Y |
The meanings of the precision combinations in the preceding table are as follows:
- Float16 precision combination: src.dtype=float16; scalar.dtype=float16; dst.dtype=float16; Parallelism PAR/repeat=128
- Float32 precision combination: src.dtype=float32; scalar.dtype=float32; dst.dtype=float32; Parallelism PAR/repeat=64
- Fixed precision combination: src.dtype=float16; scalar.dtype=float16; dst.dtype=float32; Parallelism PAR/repeat=64
Returns
None
Restrictions
- For details, see Restrictions.
- Note that mixed precision (fmix) is supported.
- In fmix mode, only the first four blocks of src are computed every iteration.
Example
from tbe import tik
tik_instance = tik.Tik()
src_gm = tik_instance.Tensor("float16", (128,), name="src_gm", scope=tik.scope_gm)
dst_gm = tik_instance.Tensor("float32", (64,), name="dst_gm", scope=tik.scope_gm)
src_ub = tik_instance.Tensor("float16", (128,), name="src_ub", scope=tik.scope_ubuf)
dst_ub = tik_instance.Tensor("float32", (64,), name="dst_ub", scope=tik.scope_ubuf)
# Move the user input from the Global Memory to the Unified Buffer.
tik_instance.data_move(src_ub, src_gm, 0, 1, 8, 0, 0)
# Assign 10 to the destination Unified Buffer as its initial value.
tik_instance.vec_dup(64, dst_ub, 10, 1, 8)
tik_instance.vec_axpy(64, dst_ub, src_ub, 2.0, 1, 8, 4)
# Move the compute result from the Unified Buffer to the Global Memory.
tik_instance.data_move(dst_gm, dst_ub, 0, 1, 8, 0, 0)
tik_instance.BuildCCE(kernel_name="vec_axpy", inputs=[src_gm], outputs=[dst_gm])
Result example:
Input (src_gm): [ 4.8 5.363 -8.375 -0.01346 -9.195 -6.03 -4.336 -8.73 4.715 -0.805 2.168 6.094 -5.414 -8.16 8.86 -2.135 0.2925 -4.21 4.18 8.94 -7.797 -4.54 -2.082 -1.0625 6.38 7.918 -7.52 -2.055 5.86 -4.562 0.9116 8.09 2.11 -1.92 9.01 -5.24 3.81 2.264 9.04 -9.305 2.771 -4.445 1.704 -5.65 1.71 -8. -3.076 -8.86 5.258 -3.928 1.929 5.273 -9.734 1.14 -8.71 -3.385 -9.85 -3.643 4.188 -4.406 -6.008 1.957 -4.496 1.547 5.207 -7.957 2.145 8.36 -9.375 -0.1924 9.54 8.16 0.8003 8.34 -2.846 3.871 -8.8 -9.95 -8.414 5.504 9.414 1.483 -6.547 6.84 0.5835 -0.1847 -2.719 -4.773 -4.56 0.816 -1.507 -7.633 -3.885 -0.1384 -8.945 9.78 -5.13 -3.174 -1.487 -6.984 4.76 -5.758 9.34 -4.35 -4.05 -7.36 -1.642 -7.832 -4.977 -5.395 -8.23 6.438 1.207 -8.484 2.71 -9.664 3.709 -7.832 -4.965 -7.49 -4.887 7.09 -7.887 -4.785 -3.54 9.44 5.32 7.914 ] Output (dst_gm): [19.601562 20.726562 -6.75 9.9730835 -8.390625 -2.0625 1.328125 -7.453125 19.429688 8.389648 14.3359375 22.1875 -0.828125 -6.3125 27.71875 5.7304688 10.584961 1.578125 18.359375 27.875 -5.59375 0.921875 5.8359375 7.875 22.757812 25.835938 -5.0390625 5.890625 21.71875 0.875 11.823242 26.1875 14.21875 6.1601562 28.015625 -0.4765625 17.621094 14.527344 28.078125 -8.609375 15.542969 1.109375 13.408203 -1.296875 13.419922 -6. 3.8476562 -7.71875 20.515625 2.1445312 13.857422 20.546875 -9.46875 12.279297 -7.421875 3.2304688 -9.703125 2.7148438 18.375 1.1875 -2.015625 13.9140625 1.0078125 13.09375 ]
Parent topic: Triple Sources with One Scalar (Gather Mode)