vec_conv

Description

Converts the src tensor with one data type to the dst tensor with a different data type.

The following describes how floating-point numbers are represented:

i. The float16 data type occupies 16 bits, including 1 sign bit, (S), 5 exponent bits (E), and 10 mantissa bits (M).

When the exponent bits (E) are not all 0s or all 1s, the value is .

When the exponent bits (E) are all 0s, the value is .

When the exponent bits (E) are all 1s and the mantissa bits (M) are all 0s, the value is ±inf (depending on the sign bit). When the exponent bits (E) are all 1s and the mantissa bits (M) are not all 0s, the value is Not-a-Number (NaN).

The preceding figure represents the value 1.75, as S = 0, E = 15, and M = 2**(–1) + 2**(–2).

ii. The float32 data type occupies 32 bits, including 1 sign bit, (S), 8 exponent bits (E), and 23 mantissa bits (M).

When the exponent bits (E) are not all 0s or all 1s, the value is .

When the exponent bits (E) are all 0s, the value is .

When the exponent bits (E) are all 1s and the mantissa bits (M) are all 0s, the value is ±inf (depending on the sign bit). When the exponent bits (E) are all 1s and the mantissa bits (M) are not all 0s, the value is Not-a-Number (NaN).

The preceding figure represents the value 1.75, as S = 0, E = 127, and M = 2**(–1) + 2**(–2).

vec_conv supports the following data type conversion modes:

1. f322f32: rounds src according to round_mode and writes the result in the float32 format to dst.

For example, in the case of input 0.5:

0.0 is output in 'round' mode, 0.0 is output in 'floor' mode, 1.0 is output in 'ceil' mode, 1.0 is output in 'away-zero' mode, and 0.0 is output in 'to-zero' mode.

2. f322f16: rounds src according to round_mode and writes the result in float16 format to dst (the overflow part is saturated).

For example, for input 0.5+2**(–12), it is represented as 2**(–1) * (1+2**(–11)) in float32 format, meaning that E = –1 + 127 = 126, and M = 2**(–11).

The exponent bits of float16 can represent 2**(–1), meaning E = –1 + 15 = 14.

However, float16 has only 10 mantissa bits. Therefore, the gray part needs to be rounded.

In 'round' mode, the result mantissa is 0000000000, meaning E = 14, and M = 0. The final result is 0.5.

In 'floor' mode, the result mantissa is 0000000000, meaning E = 14, and M = 0. The final result is 0.5.

In 'ceil' mode, the result mantissa is 0000000001, meaning E = 14, and M = 2**(–10). The final result is 0.5+2**(–11).

In 'away-zero' mode, the result mantissa is 0000000001, meaning E = 14, and M = 2**(–10). The final result is 0.5+2**(–11).

In 'to-zero' mode, the result mantissa is 0000000000, meaning E = 14, and M = 0. The final result is 0.5.

In 'odd' mode, the result mantissa is 0000000001, meaning E=14, and M = 2**(–10). The final result is 0.5+2**(–11).

3. f322s64: rounds src according to round_mode and writes the result in int64 format to dst (the overflow part is saturated).

For example, in the case of input 2**22+0.5:

2**22 is output in 'round' mode; 2**22 is output in 'floor' mode; 2**22+1 is output in 'ceil' mode; 2**22+1 is output in 'away-zero' mode; 2**22 is output in 'to-zero' mode.

4. f322s32: rounds src according to round_mode and writes the result in int32 format to dst (the overflow part is saturated).

For example, in the case of input 2**22+0.5:

2**22 is output in 'round' mode; 2**22 is output in 'floor' mode; 2**22+1 is output in 'ceil' mode; 2**22+1 is output in 'away-zero' mode; 2**22 is output in 'to-zero' mode.

5. f322s16: rounds src according to round_mode and writes the result in int16 format to dst (the overflow part is saturated).

For example, in the case of input 2**22+0.5:

2**15–1 is output in 'round' mode, 2**15–1 is output in 'floor' mode, 2**15–1 is output in 'ceil' mode, 2**15–1 is output in 'away-zero' mode, and 2**15–1 is output in 'to-zero' mode (overflow processing).

6. f162f32: writes src to dst in float32 format – precision conversion is not involved.

For example, in the case of input 1.5–2**(–10) and output 1.5–2**(–10):

7. f162s32: rounds src according to round_mode and writes the result in int32 format to dst.

For example, in the case of input –1.5:

–2 is output in 'round' mode, –2 is output in 'floor' mode, –1 is output in 'ceil' mode, –2 is output in 'away-zero' mode, and –1 is output in 'to-zero' mode.

8. f162s16: rounds src according to round_mode and writes the result in int16 format to dst (the overflow part is saturated).

For example, in the case of input 2**7–0.5:

2**7 is output in 'round' mode, 2**7–1 is output in 'floor' mode, 2**7 is output in 'ceil' mode, 2**7 is output in 'away-zero' mode, and 2**7–1 is output in 'to-zero' mode.

9. f162s8: rounds src according to round_mode and writes the result in int8 format to dst (the overflow part is saturated).

For example, in the case of input 2**7–0.5:

2**7–1 is output in 'round' mode, 2**7–1 is output in 'floor' mode, 2**7–1 is output in 'ceil' mode, 2**7–1 is output in 'away-zero' mode, and 2**7–1 is output in 'to-zero' mode (overflow processing).

10. f162u8: rounds src according to round_mode and writes the result in uint8 format to dst (the overflow part is saturated).

For example, in the case of input 1.75:

2 is output in 'round' mode, 1 is output in 'floor' mode, 2 is output in 'ceil' mode, 2 is output in 'away-zero' mode, and 2 is output in 'to-zero' mode.

11. u82f16: writes src to dst in float16 format – precision conversion is not involved.

For example, in the case of input 1 and output 1.0:

12. s82f16: writes src to dst in float16 format – precision conversion is not involved.

For example, in the case of input -1 and output -1.0:

13. s162f16: rounds src according to round_mode and writes the result in float16 format to dst.

For example, for input 2**12+2, it is represented as 2**12 * (1+2**(-11)) in float16 format, meaning that E = 12 + 15 = 27, and M = 2**(-11).

However, float16 has only 10 mantissa bits. Therefore, the gray part needs to be rounded.

In 'round' mode, the result mantissa is 0000000000, meaning E = 27, and M = 0. The final result is 2**12.

In 'floor' mode, the result mantissa is 0000000000, meaning E = 27, and M = 0. The final result is 2**12.

In 'ceil' mode, the result mantissa is 0000000001, meaning E = 27, and M = 2**(–10). The final result is 2**12+4.

In 'away-zero' mode, the result mantissa is 0000000001, meaning E = 27, and M = 2**(–10). The final result is 2**12+4.

In 'to-zero' mode, the result mantissa is 0000000000, meaning E = 27, and M = 0. The final result is 2**12.

14. s162f32: writes src to dst in float32 format – precision conversion is not involved.

For example, in the case of input 2**15–1 and output 2**15–1:

15. s162s8: If deqscale is a Tensor, the first 16 elements of the tensor are used, which are recorded as {deq_factor[i], i=0,1,2, ...,15}. If deqscale is an int, Scalar, or Expr, the value of each deq_factor[i] is equal to deqscale. deq_factor[i][46] must be 1. deq_factor[i][31:13] are considered as floats (1 sign bit, 8 exponent bits, and 10 mantissa bits), for scale[i]. deq_factor[i][45:37] are considered as int9s, for offset[i].

deqscale also supports separate scale and offset arguments. In this case, deqscale is a list or tuple of two elements. The first element indicates scale[i], which must be of the int/float type and within the range representable by 19 float bits (1 sign bit, 8 exponent bits, and 10 mantissa bits). As shown in the following figure, if scale is set to 0.5+2**(–12), as it is out of the range of the 10 mantissa bits, the gray part will be discarded (by setting this part to all 0s) and the scale value that will take effect is 0.5.

The second element indicates offset[i], which is an int ranging from –256 to 255.

Convert sixteen src elements {src[j*16+i], i = 0,1,2,...,15, j = 0,1,2,...} each time, calculate src[j*16+i]*scale[i] (float32 type), round the result to int9 (the overflow part is saturated), and add the offset to the intermediate result. The result is saved to dst[j*16+i] in the int8 format (the overflow part is saturated).

Example: deqscale is a Tensor, which stores 16 elements of the uint64 type: 2**46+2**31+ (127+i) * 2**23 (scale[i]=–2**i, offset[i]=0).

Input: [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1]

Output: [-1,-2,-4,-8,-16,-32,-64,-128,-128,-128,-128,-128,-128,-128,-128,-128] (overflow processing)

16. s162u8: If deqscale is a Tensor, the first 16 elements of the tensor are used, which are recorded as {deq_factor[i], i=0,1,2, ...,15}. If deqscale is an int, Scalar, or Expr, the value of each deq_factor[i] is equal to deqscale. deq_factor[i][46] must be 0. deq_factor[i][31:13] are considered as floats (1 sign bit, 8 exponent bits, and 10 mantissa bits), for scale[i]. deq_factor[i][45:37] are considered as int9s, for offset[i].

deqscale also supports separate scale and offset arguments. In this case, deqscale is a list or tuple of two elements. The first element indicates scale[i], which must be of the int/float type and within the range representable by 19 float bits (1 sign bit, 8 exponent bits, and 10 mantissa bits). The second element indicates offset[i], which must be of the int type and in the range of [–256, +255].

Convert sixteen src elements {src[j*16+i], i = 0,1,2,...,15, j = 0,1,2,...} each time, calculate src[j*16+i]*scale[i] (float32 type), round the result to int9 (the overflow part is saturated), and add the offset to the intermediate result. The result is saved to dst[j*16+i] in the uint8 format (the overflow part is saturated).

Example: deqscale is a Tensor, which stores 16 elements of the uint64 type: i*2**37+127*2**23 (scale[i]=1, offset[i]=i).

Input: [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1]

Output: [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16]

17. s322f32: rounds src according to round_mode and writes the result in float32 format to dst.

For example, for input 2**25+3, it is represented as 2**25 * (1+2**(–24)+2**(–25)) in float32 format, meaning E = 25 + 127 = 152, and M = 2**(–24) + 2**(–25).

However, float32 has only 23 mantissa bits. Therefore, the gray part needs to be rounded.

In 'round' mode, the result mantissa is 00000000000000000000001, meaning E = 152, and M = 2**(–23). The final result is 2**25+4.

In 'floor' mode, the result mantissa is 00000000000000000000000, meaning E = 152, and M = 0. The final result is 2**25.

In 'ceil' mode, the result mantissa is 00000000000000000000001, meaning E = 152, and M = 2**(–23). The final result is 2**25+4.

In 'away-zero' mode, the result mantissa is 00000000000000000000001, meaning E = 152, and M = 2**(–23). The final result is 2**25+4.

In 'to-zero' mode, the result mantissa is 00000000000000000000000, meaning E = 152, and M = 0. The final result is 2**25.

18. s322f16: rounds src*deqscale in 'round' mode and writes the result in float16 format to dst (the overflow part is saturated by default).

Example: deqscale = 3.0, with the input of 2**10+1, the product is 2**11+2**10+3, and 2**11+2**10+4 is rounded off based on the 'round' mode (see the conversion of s162f16).

19. s322s64: writes src to dst in int64 format – precision conversion is not involved.

For example, in the case of input 2**31-1 and output 2**31-1.

20. s322s16: writes src to dst in int16 format (the overflow part is saturated by default) – precision conversion is not involved.

For example, in the case of input 2**31-1 and output 2**15-1.

21. s642s32: writes src to dst in int32 format (the overflow part is saturated by default) – precision conversion is not involved.

For example, in the case of input 2**31 and output 2**31-1.

22. s642f32: rounds src according to round_mode and writes the result in float32 format to dst.

For example, for input 2**35+2**12+2**11, it is represented as 2**35 * (1+2**(-23)+2**(-24)) in float32 format, meaning E = 35 + 127 = 162, and M = 2**(–23) + 2**(–24).

However, float32 has only 23 mantissa bits. Therefore, the gray part needs to be rounded.

In 'round' mode, the result mantissa is 00000000000000000000010, meaning E = 162, and M = 2**(–22). The final result is 2**35+2**13.

In 'floor' mode, the result mantissa is 00000000000000000000001, meaning E = 162, and M = 2**(–23). The final result is 2**25+2**12.

In 'ceil' mode, the result mantissa is 00000000000000000000010, meaning E = 162, and M = 2**(–22). The final result is 2**25+2**13.

In 'away-zero' mode, the result mantissa is 00000000000000000000010, meaning E = 162, and M = 2**(–22). The final result is 2**25+2**13.

In 'to-zero' mode, the result mantissa is 00000000000000000000001, meaning E = 162, and M = 2**(–23). The final result is 2**25+2**12.

Prototype

vec_conv(mask, round_mode, dst, src, repeat_times, dst_rep_stride, src_rep_stride, deqscale=None, ldst_high_half=False)

Parameters

Table 1 Parameter description

Parameter

Input/Output

Description

mask

Input

For details, see the description of the mask parameter in Table 1.

round_mode

Input

A string for the rounding mode, selected from:

  • '' or 'none': If accuracy drop is involved during conversion, the 'round' mode is used. If accuracy drop is not involved, the value is not rounded.
  • 'round': perform banker's rounding (C language rint)
  • 'floor': round to minus infinity (C language floor)
  • 'ceil' or 'ceiling': round to positive infinity (C language ceil)
  • 'away-zero': round to nearest, tie away from zero (C language round)
  • 'to-zero': round to zero (C language trunc)
  • 'odd': round to odd (Von Neumann rounding)

dst

Output

Destination operand.

The scope of the tensor is the Unified Buffer.

src

Input

Source operand.

The scope of the tensor is the Unified Buffer.

repeat_times

Input

Repeat times (or iterations).

dst_rep_stride

Input

Repeat stride size for the destination operand between the corresponding blocks of successive iterations.

src_rep_stride

Input

Repeat stride size for the source operand between the corresponding blocks of successive iterations.

deqscale

Input

Quantization scale, which is an auxiliary conversion parameter. Defaults to None.

ldst_high_half

Input

A bool specifying whether dst_list or src_list stores or comes from the upper or lower half of each block. Defaults to False.

True indicates the upper half, and False indicates the lower half.

Note: This parameter defines different functions for different combinations, indicating the storage and read of dst_list and src_list respectively.

The Atlas 200/300/500 Inference Product does not support this parameter.

The Atlas Training Series Product does not support this parameter.

Table 2 Atlas 200/300/500 Inference Product round_mode

src.dtype

dst.dtype

Supported round_mode

deqscale

float16

int32

'round', 'floor', 'ceil', 'ceiling'

None

float16

float32

'', 'none'

None

float32

float16

'', 'none'

None

float16

int8

'', 'none'

None

float16

uint8

'', 'none'

None

int32

float16

'', 'none'

Scalar (float16)/Immediate (float)

uint8

float16

'', 'none'

None

int8

float16

'', 'none'

None

Table 3 Atlas Training Series Product round_mode

src.dtype

dst.dtype

Supported round_mode

deqscale

float16

int32

'round', 'floor', 'ceil', 'ceiling', 'away-zero', 'to-zero'

None

float32

int32

'round', 'floor', 'ceil', 'ceiling', 'away-zero', 'to-zero'

None

int32

float32

'', 'none'

None

float16

float32

'', 'none'

None

float32

float16

'', 'none', 'odd'

None

float16

int8

'', 'none', 'floor', 'ceil', 'ceiling', 'away-zero', 'to-zero'

None

float16

uint8

'', 'none', 'floor', 'ceil', 'ceiling', 'away-zero', 'to-zero'

None

int32

float16

'', 'none'

Scalar (float16)/Immediate (float)

uint8

float16

'', 'none'

None

int8

float16

'', 'none'

None

Returns

None

Applicability

Atlas 200/300/500 Inference Product

Atlas Training Series Product

Restrictions

  • repeat_times ∈ [0, 255]. Must be a Scalar of type int16/int32/int64/uint16/uint32/uint64, an immediate of type int, or an Expr of type int16/int32/int64/uint16/uint32/uint64. If repeat_times is an immediate, 0 is not supported.
  • The parallelism degree in each repeat depends on the data precision and SoC version. For example, 64 source or destination elements are operated in each repeat during float32 to float16 conversion.
  • dst_rep_stride and src_rep_stride , in the unit of 32 bytes. Must be a Scalar of type int16/int32/int64/uint16/uint32/uint64, an immediate of type int, or an Expr of type int16/int32/int64/uint16/uint32/uint64.
  • The supported data types of dst and src are related to the chip version. If the data types are not supported, the tool reports an error.
  • dst and src must be different tensors or the same element of the same tensor, rather than different elements of the same tensor.
  • When src is of type float32 and dst is of type float32, the source operand is rounded to an integer of type float32. In other cases, the source operand is rounded to what is representable by dst's data type.
  • To save memory space, you can define a tensor reused by the source and destination operands (which means they have overlapped addresses). The general instruction restrictions are as follows.
    • In the event of a single repeat (repeat_times = 1), the source operand must completely overlap the destination operand.
    • In the event of multiple repeats (repeat_times > 1), if there is a dependency between the source operand and the destination operand, that is, the destination operand of the Nth iteration is the source operand of the (N+1)th iteration, address overlapping is not allowed.
  • For details about the alignment requirements of the operand address offset, see General Restrictions.
  • Rounding in binary mode is similar to that in decimal mode.
    • In 'round' mode, if the first bit to be rounded is 0, no carry is performed. If the first bit to be rounded is 1 and the subsequent bits are not all 0s, carry is performed.

    If the first bit is 1 and all subsequent bits are 0, no carry is performed when the last M bit is 0, and carry is performed when the last M bit is 1.

    • In 'floor' mode, if bit S is 0, no carry is performed. If bit S is 1 and the bits to be rounded are all 0s, carry is not performed; in other cases, carry is performed.
    • In 'ceil'/'ceiling' mode, if bit S is 1, no carry is performed. If bit S is 0 and the bits to be rounded are all 0s, carry is not performed; in other cases, carry is performed.
    • In 'away-zero' mode, if the first bit to be rounded is 0, no carry is performed; in other cases, carry is performed.
    • In 'to-zero' mode, no carry is performed.
    • In 'odd' mode, if the bits to be rounded are all 0s, no carry is performed.

      If the bits to be rounded are not all 0s, no carry is performed when the last M bit is 1, and carry is performed when the last M bit is 0.

Example

Example 1

tik_instance = tik.Tik()
dtype_size = {
    "int8": 1,
    "uint8": 1,
    "int16": 2,
    "uint16": 2,
    "float16": 2,
    "int32": 4,
    "uint32": 4,
    "float32": 4,
    "int64": 8,
}

src_shape = (2, 128)
dst_shape = (3, 64)
src_dtype = "float16"
dst_dtype = "int32"
# Data volume
elements = 2 * 128
dst_elements = 3 * 64
# Number of operations per iteration, which is 64 in the current example. In bitwise mode, mask can be represented as [0, 2**64-1].
mask = 64
# rep_stride indicates the address stride between operands in adjacent iterations. In the current example, dst_rep_stride is 16, indicating that the start position of the second iteration is 16 blocks away from the start position of the first iteration.
dst_rep_stride = 16
src_rep_stride = 8
# Number of iterations, which is 2 in the current example. You can adjust the number of iterations as required.
repeat_times = 2
# In the current example, banker's rounding is used as an example.
round_mode = "round"
# Indicates whether the data is stored in the upper or lower half of dst. In the current example, the data is stored in the upper half.
ldst_high_half = False
deqscale = None

src_gm = tik_instance.Tensor(src_dtype, src_shape, name="src_gm", scope=tik.scope_gm)
dst_gm = tik_instance.Tensor(dst_dtype, dst_shape, name="dst_gm", scope=tik.scope_gm)
src_ub = tik_instance.Tensor(src_dtype, src_shape, name="src_ub", scope=tik.scope_ubuf)
dst_ub = tik_instance.Tensor(dst_dtype, dst_shape, name="dst_ub", scope=tik.scope_ubuf)
# Number of moved segments.
nburst = 1
# Length of the moved segment each time, in 32 bytes.
src_burst = elements * dtype_size[src_dtype] // 32 // nburst
dst_burst = dst_elements * dtype_size[dst_dtype] // 32 // nburst
# Stride between the previous burst tail and the next burst header, in 32 bytes.
dst_stride, src_stride = 0, 0
# Copy the user input to the source Unified Buffer.

tik_instance.data_move(src_ub, src_gm, 0, nburst, src_burst, dst_stride, src_stride)
# To facilitate observation, set the destination operand to zero.
tik_instance.vec_dup(64, dst_ub, 0, 3, 8)

# Convert precision with vec_conv.
tik_instance.vec_conv(mask, round_mode, dst_ub, src_ub, repeat_times, dst_rep_stride, src_rep_stride, deqscale=deqscale,
                   ldst_high_half=ldst_high_half)

# Move data from the Unified Buffer to the Global Memory.
tik_instance.data_move(dst_gm, dst_ub, 0, nburst, dst_burst, dst_stride, src_stride)
tik_instance.BuildCCE(kernel_name="vec_conv", inputs=[src_gm], outputs=[dst_gm])

Result example:
Input (src_gm):
[[7.996   7.875   5.14    2.266   4.844   7.492   1.845   7.492   6.824
  3.223   0.809   2.033   2.773   0.2542  7.59    4.992   2.473   3.47
  2.85    4.35    6.39    3.168   6.715   2.11    6.94    6.98    4.59
  2.883   8.21    1.8125  3.447   0.0353  5.055   1.697   8.836   1.68
  3.29    5.965   0.3535  5.6     7.977   7.902   7.56    1.571   4.504
  7.863   5.492   1.106   3.969   1.315   1.896   6.61    0.281   2.482
  5.49    4.06    3.652   6.3     3.916   8.77    2.838   6.023   4.63
  8.15    8.266   4.523   0.10114 5.04    2.479   0.5713  2.324   3.986
  6.957   0.208   2.807   8.945   2.559   1.896   2.299   5.566   2.498
  8.      8.516   2.432   4.52    5.77    2.465   2.684   4.11    3.705
  7.332   1.713   3.768   6.94    8.24    7.836   5.492   8.64    6.36
  6.098   7.1     8.62    2.082   2.15    4.188   7.33    7.723   8.086
  8.945   2.754   7.617   1.895   5.69    3.176   8.18    4.617   8.42
  8.15    4.01    1.016   4.004   7.098   7.445   7.48    5.316   7.54
  5.44    5.098  ]
 [2.795   8.516   6.      4.758   1.311   4.703   7.86    0.8057  1.796
  2.908   3.363   0.916   6.      3.2     1.468   7.125   3.213   5.32
  1.127   1.906   7.285   4.29    6.438   8.7     2.652   5.426   7.19
  2.496   2.523   6.76    0.3948  3.908   7.367   1.133   8.06    7.277
  5.445   0.0669  3.072   0.2046  6.625   8.94    5.527   8.11    7.082
  1.025   6.566   0.7217  1.268   0.8843  1.702   3.65    2.445   0.782
  5.316   0.945   7.918   0.2131  4.844   7.598   6.695   0.562   3.53
  3.822   7.152   2.793   2.121   3.65    4.08    6.83    2.617   8.59
  5.168   8.06    7.598   7.082   7.742   3.01    5.758   3.236   2.225
  0.933   3.963   3.873   7.645   3.703   2.373   1.344   8.14    5.742
  8.16    1.834   1.135   6.457   8.03    8.305   5.695   1.066   1.298
  8.61    3.057   1.526   3.59    6.316   6.992   4.258   6.617   4.81
  5.6     6.297   4.066   6.234   5.4     4.69    4.105   8.54    4.617
  3.87    1.194   5.88    7.504   2.055   6.46    5.01    4.855   2.32
  2.232   2.617  ]]
Output (dst_gm):
[[8 8 5 2 5 7 2 7 7 3 1 2 3 0 8 5 2 3 3 4 6 3 7 2 7 7 5 3 8 2 3 0 5 2 9 2
  3 6 0 6 8 8 8 2 5 8 5 1 4 1 2 7 0 2 5 4 4 6 4 9 3 6 5 8]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [3 9 6 5 1 5 8 1 2 3 3 1 6 3 1 7 3 5 1 2 7 4 6 9 3 5 7 2 3 7 0 4 7 1 8 7
  5 0 3 0 7 9 6 8 7 1 7 1 1 1 2 4 2 1 5 1 8 0 5 8 7 1 4 4]]

Example 2

"""This example shows the effect of using ldst_high_half and deqscale."""
tik_instance = tik.Tik()
dtype_size = {
    "int8": 1,
    "uint8": 1,
    "int16": 2,
    "uint16": 2,
    "float16": 2,
    "int32": 4,
    "uint32": 4,
    "float32": 4,
    "int64": 8,
}

src_shape = (2, 128)
dst_shape = (3, 128)
src_dtype = "int16"
dst_dtype = "int8"
elements = 2 * 128
dst_elements = 3 * 128
# Number of operations per iteration, which is 128 in the current example. In bitwise mode, mask can be represented as [2**64-1, 2**64-1].
mask = 128
# Iteration stride between the previous repeat header and the next repeat header of the destination operand. The unit is 32 bytes. dst is int8, with 32 operands in a block, and src is int16, with 16 operands in a block.
dst_rep_stride = 4
src_rep_stride = 8

# Number of iterations, which is 8 in the current example. You can adjust the number of iterations as required.
repeat_times = 2

# The current example uses none as an example.
round_mode = "none"
# Indicates whether the data is stored in the upper or lower half of dst. In the current example, the data is stored in the upper half.
ldst_high_half = True
# To convert the data type from int16 to int8, the 46th bit of deqscale must be 0b1.
deqscale = 2 ** 46 - 1

src_gm = tik_instance.Tensor(src_dtype, src_shape, name="src_gm", scope=tik.scope_gm)
src1_gm = tik_instance.Tensor(dst_dtype, dst_shape, name="src1_gm", scope=tik.scope_gm)
dst_gm = tik_instance.Tensor(dst_dtype, dst_shape, name="dst_gm", scope=tik.scope_gm)
src_ub = tik_instance.Tensor(src_dtype, src_shape, name="src_ub", scope=tik.scope_ubuf)
dst_ub = tik_instance.Tensor(dst_dtype, dst_shape, name="dst_ub", scope=tik.scope_ubuf)
# Number of moved segments.
nburst = 1
# Length of the moved segment each time, in 32 bytes.
src_burst = elements * dtype_size[src_dtype] // 32 // nburst
dst_burst = dst_elements * dtype_size[dst_dtype] // 32 // nburst
# Stride between the previous burst tail and the next burst header, in 32 bytes.
dst_stride, src_stride = 0, 0
# Copy the user input to the source Unified Buffer.

tik_instance.data_move(src_ub, src_gm, 0, nburst, src_burst, dst_stride, src_stride)
# To facilitate observation, set the destination operand to zero.
tik_instance.data_move(dst_ub, src1_gm, 0, nburst, dst_burst, dst_stride, src_stride)

# Convert precision with vec_conv.
tik_instance.vec_conv(mask, round_mode, dst_ub, src_ub, repeat_times, dst_rep_stride, src_rep_stride, deqscale=deqscale,
                      ldst_high_half=ldst_high_half)

# Move data from the Unified Buffer to the Global Memory.
tik_instance.data_move(dst_gm, dst_ub, 0, nburst, dst_burst, dst_stride, src_stride)
tik_instance.BuildCCE(kernel_name="vec_conv", inputs=[src_gm, src1_gm], outputs=[dst_gm])

Result example:
Input (src_gm):
[[6 8 6 7 2 5 7 0 7 8 4 1 2 1 5 1 1 8 2 5 7 5 8 6 1 7 4 6 0 5 3 1 4 6 4 0
  0 1 4 3 0 2 2 3 3 0 3 6 6 3 5 7 2 3 1 0 8 5 5 4 7 6 3 7 3 6 8 3 3 1 4 1
  1 6 7 8 1 0 0 3 3 0 3 1 1 4 0 4 2 0 6 1 8 1 4 1 7 5 7 5 0 4 6 3 3 8 3 1
  2 1 8 5 1 4 5 6 3 1 6 2 2 1 8 4 0 6 1 5]
 [8 7 1 7 0 0 2 4 1 7 2 2 7 8 2 6 3 6 0 6 2 4 0 4 7 7 8 4 2 0 1 5 1 0 3 0
  1 6 2 6 2 5 0 3 0 2 1 7 7 8 7 0 0 4 3 4 5 6 2 6 1 5 2 1 6 7 0 1 4 2 0 1
  3 8 4 0 1 1 6 1 6 8 4 0 5 8 1 1 3 2 1 2 2 8 7 2 6 8 8 5 0 3 1 4 4 0 1 3
  0 5 3 7 8 7 4 8 1 3 4 5 7 4 3 6 5 4 8 2]]
Input (src1_gm):
[[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]]
Output (dst_gm):
[[ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 -1 -1 -1 -1 -1 -1 -1 -1
  -1 -1 -1 -1 -1 -1 -1 -1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1  0  0  0  0  0  0  0  0
   0  0  0  0  0  0  0  0 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 -1 -1 -1 -1 -1 -1 -1 -1
  -1 -1 -1 -1 -1 -1 -1 -1]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 -1 -1 -1 -1 -1 -1 -1 -1
  -1 -1 -1 -1 -1 -1 -1 -1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1  0  0  0  0  0  0  0  0
   0  0  0  0  0  0  0  0 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 -1 -1 -1 -1 -1 -1 -1 -1
  -1 -1 -1 -1 -1 -1 -1 -1]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 -1 -1 -1 -1 -1 -1 -1 -1
  -1 -1 -1 -1 -1 -1 -1 -1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1  0  0  0  0  0  0  0  0
   0  0  0  0  0  0  0  0 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 -1 -1 -1 -1 -1 -1 -1 -1
  -1 -1 -1 -1 -1 -1 -1 -1]]