fixpipe
Description
Processes the matrix compute result, for example, adding an offset to and quantizing the compute result, and moving the data from the L1OUT Buffer to the Global Memory.
Prototype
fixpipe(dst, src, cburst_num, burst_len, dst_stride, src_stride, extend_params=None)
Parameters
Parameter |
Input/Output |
Description |
|---|---|---|
dst |
Output |
A Tensor of type float16, float32, or int32, for the start element of the destination operand. For details about the data type restrictions, see Table 2. The scope is the Global Memory. After fixpipe processing, the extra data allocated during matrix computation is deleted in addition to the offset and quantization operations. If this API is used to process the conv2d result, the format is [cout_blocks, howo, 16]. If this API is used to process the matmul result, the format is [N1, m, N0]. Note: For meanings of cout_blocks and howo, see the parameter description of conv2d in Parameters. For meanings of N1, m, and N0, see parameter description of matmul in Parameters. |
src |
Input |
A Tensor of type float32 or int32, for the start element of the source operand. For details about the data type restrictions, see Table 2. The scope is the L1OUT Buffer. The source operand is the result of matrix computation. If this API is used to process the conv2d result, the format is [cout_blocks, round_howo, 16]. If this API is used to process the matmul result, the format is [N1, M, N0]. Note: For meanings of cout_blocks and round_howo, see the parameter description of conv2d in Parameters. For meanings of N1, M, and N0, see parameter description of matmul in Parameters |
cburst_num |
Input |
An immediate of type int specifying the number of bursts. Must be in the range of [1, 4095]. If this API is used to process the conv2d result, the format is [cout_blocks, round_howo, 16], where cburst_num is set to cout_blocks. If this API is used to process the matmul result, the format is [N1, M, N0], where cburst_num is set to N1. Note: For meanings of cout_blocks and round_howo, see the parameter description of conv2d in Parameters. For meanings of N1, M, and N0, see parameter description of matmul in Parameters. |
burst_len |
Input |
Burst length of contiguous data transfer, in the unit of 32 bytes. The value is in the range [1, 65535]. Must be an immediate of type int. For src, the valid data segment length of each burst is as follows:
|
dst_stride |
Input |
Tail-to-header stride between adjacent bursts of the dst operand tensor, in the unit of 32 bytes. Must be in the range [0, 65535]. Must be an immediate of type int. |
src_stride |
Input |
Tail-to-header stride between adjacent bursts of the dst operand tensor, in the unit of 256 elements. Must be in the range [0, 65535]. Must be an immediate of type int. This parameter is reserved. To ensure data accuracy, pass 0. |
extend_params |
Input |
A dictionary of extended parameters. Defaults to None. Currently, three keys are supported: bias, quantize_params, and relu, which are described as follows: 1. Key: "bias" Value: defaults to None, indicating bias disabled. To enable bias, specify the value as the start element of the bias operand. Has the same data type as src (a Tensor of type int32 or float32). Has shape [Cout,]. Cout: number of convolution kernels if src is the output of conv2d; or the length in the N dimension if src is the output of matmul. The scope is the L1 Buffer. 2. Key: "quantize_params" Value: defaults to None, indicating quantization disabled. If quantization is enabled, the value is a dictionary of two keys: "mode" and "mode_param". The value of "mode" is a string, for the quantization mode:
The value of "mode_param" can be:
3. Key: "relu" Value: defaults to False. A bool. False indicates the ReLU function is disabled. True indicates that the ReLU function is enabled. Notes:
|
Applicability
Restrictions
- Single-step debugging takes a long time, and is therefore not recommended.
- The functions enabled in extend_params are executed in the following sequence.

- This instruction is mutually exclusive with Vector instructions.
- For details about the alignment requirements of the operand address offset, see General Restrictions.
Returns
None
Example
- Example: src is of type int32 and dst is of type float16, bias is disabled, and mode_param is a Tensor.
from tbe import tik tik_instance = tik.Tik() dtype_size = { "int8": 1, "uint8": 1, "int16": 2, "uint16": 2, "float16": 2, "int32": 4, "uint32": 4, "float32": 4, "int64": 8, } fm_dtype = "uint8" ker_dtype = "int8" deq_dtype="float16" dst_dtype = "int32" fm_shape = [1, 4, 4, 32] kernel_shape = [1, 2, 2, 32, 32] dst_shape = [2, 9, 16] dst_l1_shape = [2, 16, 16] deq_shape = [16] # Convolution stride, [stride_h, stride_w] stride = [1, 1] # Padding factors, in the format of [pad_left, pad_right, pad_top, pad_bottom] pad = [0, 0, 0, 0] # Convolution dilation factors, in the format of [dilation_h, dilation_w] dilation = [1, 1] # Padding value pad_value = 0 # Define the tensors. feature_map_gm = tik_instance.Tensor(fm_dtype, fm_shape, name='feature_map_gm', scope=tik.scope_gm) weight_gm = tik_instance.Tensor(ker_dtype, kernel_shape, name='weight_gm', scope=tik.scope_gm) deqscale_gm = tik_instance.Tensor(deq_dtype, deq_shape, name='deqscale_gm', scope=tik.scope_gm) dst_gm = tik_instance.Tensor(deq_dtype, dst_shape, name='dst_gm', scope=tik.scope_gm) feature_map = tik_instance.Tensor(fm_dtype, fm_shape, name='feature_map', scope=tik.scope_cbuf) weight = tik_instance.Tensor(ker_dtype, kernel_shape, name='weight', scope=tik.scope_cbuf) deqscale = tik_instance.Tensor(deq_dtype, deq_shape, name='deqscale', scope=tik.scope_cbuf) dst_l1out = tik_instance.Tensor(dst_dtype, dst_l1_shape, name='dst_l1out', scope=tik.scope_cbuf_out) # Move data from the Global Memory to the source operand tensor. tik_instance.data_move(feature_map, feature_map_gm, 0, 1, 16, 0, 0) tik_instance.data_move(weight, weight_gm, 0, 1, 128, 0, 0) tik_instance.data_move(deqscale, deqscale_gm, 0, 1, 1, 0, 0) # Perform convolution. tik_instance.conv2d(dst_l1out, feature_map, weight, fm_shape, kernel_shape, stride, pad, dilation, pad_value) # Perform quantization using fixpipe. # Number of transferred data segments. When conv2d data is processed, cburst_num is set to cout_blocks. When matmul data is processed, cburst_num is set to N1. cburst_num = dst_l1_shape[0] # Length of a contiguously transferred data segment. When conv2d data is processed, the length is howo*16*src_dtype_size/32. When matmul data is processed, the length is m*N0*src_dtype_size/32. burst_len = dst_l1_shape[1] *16 * dtype_size[dst_dtype] / 32 # Interval between adjacent consecutive data segments of the dst_stride and src_stride tensors, that is, distance between the previous burst tail and the next burst header. The value 0 is used as an example. dst_stride, src_stride = 0, 0 tik_instance.fixpipe(dst_gm, dst_l1out, cburst_num, burst_len, dst_stride, src_stride, extend_params={"bias": None, "quantize_params": {"mode": "int322fp16", "mode_param": deqscale}}) tik_instance.BuildCCE(kernel_name="fixpipe", inputs=[feature_map_gm, weight_gm, deqscale_gm], outputs=[dst_gm])Result example:
Input: feature_map_gm: [[[[3, 2, 4, 2, ..., 4, 3]]]] weight_gm: [[[[[0, -5, -3, ..., -4, -2]]]]] deqscale_gm: [ 0.1214, -0.2238, ..., 0.4883, 0.2788] Output: dst_gm: [[[-13.48, 39.38, -114.8, 30.38, ..., 9.766, -24.81]]]
- Example: src is of type float32 and dst is of type float16, bias is enabled, and mode_param is None.
from tbe import tik tik_instance = tik.Tik() # Define the tensors. feature_map_gm = tik_instance.Tensor("float16", [2, 4, 4, 16], name='feature_map_gm', scope=tik.scope_gm) weight_gm = tik_instance.Tensor("float16", [2, 2, 2, 16, 16], name='weight_gm', scope=tik.scope_gm) bias_gm = tik_instance.Tensor("float32", (16,), name='bias_gm', scope=tik.scope_gm) dst_gm = tik_instance.Tensor("float16", [1, 4, 16], name='dst_gm', scope=tik.scope_gm) feature_map = tik_instance.Tensor("float16", [2, 4, 4, 16], name='feature_map', scope=tik.scope_cbuf) weight = tik_instance.Tensor("float16", [2, 2, 2, 16, 16], name='weight', scope=tik.scope_cbuf) bias = tik_instance.Tensor("float32", (16,), name='bias', scope=tik.scope_cbuf) dst_l1out = tik_instance.Tensor("float32", [1, 16, 16], name='dst_l1out', scope=tik.scope_cbuf_out) # Move data from the Global Memory to the source operand tensor. tik_instance.data_move(feature_map, feature_map_gm, 0, 1, 32, 0, 0) tik_instance.data_move(weight, weight_gm, 0, 1, 128, 0, 0) tik_instance.data_move(bias, bias_gm, 0, 1, 2, 0, 0) # Perform convolution. tik_instance.conv2d(dst_l1out, feature_map, weight, [2, 4, 4, 16], [2, 2, 2, 16, 16], [1, 1], [0, 0, 0, 0], [2, 2], 0) # Perform bias and quantization using fixpipe. tik_instance.fixpipe(dst_gm, dst_l1out, 1, 8, 0, 0, extend_params={"bias": bias, "quantize_params": {"mode": "fp322fp16", "mode_param": None}}) tik_instance.BuildCCE(kernel_name="conv2d", inputs=[feature_map_gm, weight_gm, bias_gm], outputs=[dst_gm])Result example:
Inputs: feature_map_gm: [[[[0.0, 0.01, 0.02, 0.03, 0.04, ..., 5.09, 5.1, 5.11]]]] weight_gm: [[[[[0.0, 0.01, 0.02, 0.03, 0.04, ..., 20.46, 20.47]]]]] bias_gm: [0.0, 1.0, 2.0, 3.0, ..., 14.0, 15.0] Output: dst_gm: [[[3568., 3614., 3660., 3704., 3750., 3794., 3840., 3884., 3930., 3976., 4020., 4066., 4110., 4156., 4200., 4250.], [3754., 3802., 3850., 3900., 3948., 3996., 4044., 4094., 4140., 4188., 4240., 4290., 4336., 4384., 4430., 4480.], [4308., 4370., 4424., 4484., 4544., 4600., 4660., 4716., 4776., 4830., 4892., 4950., 5010., 5068., 5124., 5184.], [4496., 4556., 4616., 4680., 4740., 4804., 4864., 4924., 4988., 5050., 5108., 5172., 5230., 5296., 5356., 5416.]]]