vec_dup

Description

Copies a Scalar variable or an immediate for multiple times and fill it in the vector (PAR indicates the degree of parallelism):

Prototype

vec_dup(mask, dst, scalar, repeat_times, dst_rep_stride)

Pipe: Vector

Parameters

**Table 1** Parameter description
Parameter	Input/Output	Description
mask	Input	For details, see the description of the mask parameter in Table 1.
dst	Output	A tensor for the start element of the destination operand. Must be one of the following data types: uint16, int16, float16, uint32, int32, float32 The scope of the tensor is the Unified Buffer.
scalar	Input	A Scalar or an immediate, for the source operand to be copied. Has the same dtype as dst.
repeat_times	Input	Repeat times. The addresses of the source and destination operands change upon every iteration. Must be in the range of [0, 255]. If repeat_times is an immediate, 0 is not supported. Must be a Scalar of type int16/int32/int64/uint16/uint32/uint64, an immediate of type int, or an Expr of type int16/int32/int64/uint16/uint32/uint64.
dst_rep_stride	Input	Repeat stride size for the destination operand between the corresponding blocks of iterations. Must be in the range of [0, 255], in the unit of 32 bytes. Must be a Scalar of type int16/int32/int64/uint16/uint32/uint64, an immediate of type int, or an Expr of type int16/int32/int64/uint16/uint32/uint64.

Applicability

Atlas 200/300/500 Inference Product

Atlas Training Series Product

Restrictions

For details about the alignment requirements of the operand address offset, see General Restrictions.
The caller needs to guarantee that the Scalar argument is within the valid range.

Returns

None

Example

from tbe import tik
tik_instance = tik.Tik()
dtype_size = {
    "int8": 1,
    "uint8": 1,
    "int16": 2,
    "uint16": 2,
    "float16": 2,
    "int32": 4,
    "uint32": 4,
    "float32": 4,
    "int64": 8,
}

dtype = "float16"
shape = (2, 128)
elements = 2 * 128
# Number of operations per iteration, which is 128 in the current example.
mask = 32
# repeat_time indicates the number of iterations.
repeat_times = 3
# dst_rep_stride indicates the stride between source operands of adjacent iterations. The header of the second iteration of the source operand is 5 x 16 operands away from the header of the first iteration.
dst_rep_stride = 5
dst_gm = tik_instance.Tensor(dtype, shape, name="dst_gm", scope=tik.scope_gm)
dst_ub = tik_instance.Tensor(dtype, shape, name="dst_ub", scope=tik.scope_ubuf)
# Source operand to be copied, which is a Scalar or an immediate. Has the same dtype as dst.
src_scalar = tik_instance.Scalar(init_value=0, dtype="float16")
tik_instance.vec_dup(mask, dst_ub, src_scalar, repeat_times, dst_rep_stride)
# Move input data from Global Memory to Unified Buffer. For details about data_move, see the corresponding section.
tik_instance.data_move(dst_gm, dst_ub, 0, 1, 16, 0, 0)
tik_instance.BuildCCE(kernel_name="vector_dup", inputs=[], outputs=[dst_gm])

The output data (dst_gm) is as follows:

[[  0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
    0.    0.    0.    0.    0.    0.    0.    0.  203.2 203.2 203.2 203.2
  203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2
  203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2
  203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2
  203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2   0.    0.    0.    0.
    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
    0.    0.    0.    0.  203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2
  203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2]
 [203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2
  203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2
  203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2   0.    0.    0.    0.
    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
    0.    0.    0.    0.  203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2
  203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2
  203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2
  203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2
  203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2
  203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2]]

Parent topic: Data Padding