功能说明

将一个Scalar变量或一个立即数，复制多次并填充到向量（PAR代表并行度）：

$\text{[math]}$

函数原型

vec_dup(mask, dst, scalar, repeat_times, dst_rep_stride)

PIPE：Vector

参数说明

表1 参数说明
参数名称	输入/输出	含义
mask	输入	请参考表1中mask参数描述。
dst	输出	目的操作数，tensor起始element，支持的数据类型：Tensor(uint16, int16, float16, uint32, int32, float32)。 Tensor的scope为Unified Buffer。
scalar	输入	被复制的源操作数，支持的数据类型为Scalar和立即数，dtype需与dst保持一致。
repeat_times	输入	迭代次数，每一次源操作数和目的操作数的地址都会改变。取值范围：repeat_times∈[0,255]。当repeat_times为立即数时，不支持0。支持的数据类型：Scalar(int16/int32/int64/uint16/uint32/uint64)、立即数(int)、Expr(int16/int32/int64/uint16/uint32/uint64)。
dst_rep_stride	输入	迭代间，目的操作数同一block间地址步长，取值范围：dst_rep_stride∈[0，255]，单位：为32B。支持的数据类型：Scalar(int16/int32/int64/uint16/uint32/uint64)、立即数(int)、Expr(int16/int32/int64/uint16/uint32/uint64)。

支持的芯片型号

Atlas 200/300/500 推理产品

Atlas 训练系列产品

注意事项

操作数地址偏移对齐要求请见通用约束。
用户自定义Scalar参数需自行保证不超出范围。

返回值

无。

调用示例

from tbe import tik
tik_instance = tik.Tik()
dst_gm = tik_instance.Tensor("float16", (128,), name="dst_gm", scope=tik.scope_gm)
dst_ub = tik_instance.Tensor("float16", (128,), name="dst_ub", scope=tik.scope_ubuf)
src_scalar = tik_instance.Scalar(init_value=0, dtype="float16")
tik_instance.vec_dup(128, dst_ub, src_scalar, 1, 8)
# 将计算结果拷贝到目标gm
tik_instance.data_move(dst_gm, dst_ub, 0, 1, 8, 0, 0)

tik_instance.BuildCCE(kernel_name="vec_dup", inputs=[], outputs=[dst_gm])

输出数据(dst_gm)如下所示：

[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0.]