vec_dup

Description

Copies a Scalar variable or an immediate for multiple times and fill it in the vector (PAR indicates the degree of parallelism):

Prototype

vec_dup(mask, dst, scalar, repeat_times, dst_rep_stride)

Pipe: Vector

Parameters

Table 1 Parameter description

Parameter

Input/Output

Description

mask

Input

For details, see the description of the mask parameter in Table 1.

dst

Output

A tensor for the start element of the destination operand. Must be one of the following data types: uint16, int16, float16, uint32, int32, float32

The scope of the tensor is the Unified Buffer.

scalar

Input

A Scalar or an immediate, for the source operand to be copied. Has the same dtype as dst.

repeat_times

Input

Repeat times. The addresses of the source and destination operands change upon every iteration. Must be in the range of [0, 255]. If repeat_times is an immediate, 0 is not supported. Must be a Scalar of type int16/int32/int64/uint16/uint32/uint64, an immediate of type int, or an Expr of type int16/int32/int64/uint16/uint32/uint64.

dst_rep_stride

Input

Repeat stride size for the destination operand between the corresponding blocks of iterations. Must be in the range of [0, 255], in the unit of 32 bytes. Must be a Scalar of type int16/int32/int64/uint16/uint32/uint64, an immediate of type int, or an Expr of type int16/int32/int64/uint16/uint32/uint64.

Applicability

Atlas 200/300/500 Inference Product

Atlas Training Series Product

Restrictions

  • For details about the alignment requirements of the operand address offset, see General Restrictions.
  • The caller needs to guarantee that the Scalar argument is within the valid range.

Returns

None

Example

from tbe import tik
tik_instance = tik.Tik()
dtype_size = {
    "int8": 1,
    "uint8": 1,
    "int16": 2,
    "uint16": 2,
    "float16": 2,
    "int32": 4,
    "uint32": 4,
    "float32": 4,
    "int64": 8,
}

dtype = "float16"
shape = (2, 128)
elements = 2 * 128
# Number of operations per iteration, which is 128 in the current example.
mask = 32
# repeat_time indicates the number of iterations.
repeat_times = 3
# dst_rep_stride indicates the stride between source operands of adjacent iterations. The header of the second iteration of the source operand is 5 x 16 operands away from the header of the first iteration.
dst_rep_stride = 5
dst_gm = tik_instance.Tensor(dtype, shape, name="dst_gm", scope=tik.scope_gm)
dst_ub = tik_instance.Tensor(dtype, shape, name="dst_ub", scope=tik.scope_ubuf)
# Source operand to be copied, which is a Scalar or an immediate. Has the same dtype as dst.
src_scalar = tik_instance.Scalar(init_value=0, dtype="float16")
tik_instance.vec_dup(mask, dst_ub, src_scalar, repeat_times, dst_rep_stride)
# Move input data from Global Memory to Unified Buffer. For details about data_move, see the corresponding section.
tik_instance.data_move(dst_gm, dst_ub, 0, 1, 16, 0, 0)
tik_instance.BuildCCE(kernel_name="vector_dup", inputs=[], outputs=[dst_gm])

The output data (dst_gm) is as follows:

[[  0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
    0.    0.    0.    0.    0.    0.    0.    0.  203.2 203.2 203.2 203.2
  203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2
  203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2
  203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2
  203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2   0.    0.    0.    0.
    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
    0.    0.    0.    0.  203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2
  203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2]
 [203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2
  203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2
  203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2   0.    0.    0.    0.
    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
    0.    0.    0.    0.  203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2
  203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2
  203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2
  203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2
  203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2
  203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2]]