vec_dup
Description
Copies a Scalar variable or an immediate for multiple times and fill it in the vector (PAR indicates the degree of parallelism): 
Prototype
vec_dup(mask, dst, scalar, repeat_times, dst_rep_stride)
Pipe: Vector
Parameters
Parameter |
Input/Output |
Description |
|---|---|---|
mask |
Input |
For details, see the description of the mask parameter in Table 1. |
dst |
Output |
A tensor for the start element of the destination operand. Must be one of the following data types: uint16, int16, float16, uint32, int32, float32 The scope of the tensor is the Unified Buffer. |
scalar |
Input |
A Scalar or an immediate, for the source operand to be copied. Has the same dtype as dst. |
repeat_times |
Input |
Repeat times. The addresses of the source and destination operands change upon every iteration. Must be in the range of [0, 255]. If repeat_times is an immediate, 0 is not supported. Must be a Scalar of type int16/int32/int64/uint16/uint32/uint64, an immediate of type int, or an Expr of type int16/int32/int64/uint16/uint32/uint64. |
dst_rep_stride |
Input |
Repeat stride size for the destination operand between the corresponding blocks of iterations. Must be in the range of [0, 255], in the unit of 32 bytes. Must be a Scalar of type int16/int32/int64/uint16/uint32/uint64, an immediate of type int, or an Expr of type int16/int32/int64/uint16/uint32/uint64. |
Applicability
Restrictions
- For details about the alignment requirements of the operand address offset, see General Restrictions.
- The caller needs to guarantee that the Scalar argument is within the valid range.
Returns
None
Example
from tbe import tik
tik_instance = tik.Tik()
dtype_size = {
"int8": 1,
"uint8": 1,
"int16": 2,
"uint16": 2,
"float16": 2,
"int32": 4,
"uint32": 4,
"float32": 4,
"int64": 8,
}
dtype = "float16"
shape = (2, 128)
elements = 2 * 128
# Number of operations per iteration, which is 128 in the current example.
mask = 32
# repeat_time indicates the number of iterations.
repeat_times = 3
# dst_rep_stride indicates the stride between source operands of adjacent iterations. The header of the second iteration of the source operand is 5 x 16 operands away from the header of the first iteration.
dst_rep_stride = 5
dst_gm = tik_instance.Tensor(dtype, shape, name="dst_gm", scope=tik.scope_gm)
dst_ub = tik_instance.Tensor(dtype, shape, name="dst_ub", scope=tik.scope_ubuf)
# Source operand to be copied, which is a Scalar or an immediate. Has the same dtype as dst.
src_scalar = tik_instance.Scalar(init_value=0, dtype="float16")
tik_instance.vec_dup(mask, dst_ub, src_scalar, repeat_times, dst_rep_stride)
# Move input data from Global Memory to Unified Buffer. For details about data_move, see the corresponding section.
tik_instance.data_move(dst_gm, dst_ub, 0, 1, 16, 0, 0)
tik_instance.BuildCCE(kernel_name="vector_dup", inputs=[], outputs=[dst_gm])
The output data (dst_gm) is as follows:
[[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 203.2 203.2 203.2 203.2
203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2
203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2
203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2
203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2
203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2]
[203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2
203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2
203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2
203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2
203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2
203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2
203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2
203.2 203.2 203.2 203.2 203.2 203.2 203.2 203.2]]