conv2d
Description
Performs 2D convolution on an input tensor and a weight tensor and outputs a result tensor.

The following data types are supported (feature_map:weight:dst):
- int8:int8:int32
- float16:float16:float32
Prototype
conv2d(dst, feature_map, weight, fm_shape, kernel_shape, stride, pad, dilation, pad_value=0, init_l1out=True)
Parameters
|
Parameter |
Input/Output |
Description |
|---|---|---|
|
dst |
Output |
Start element of the destination operand. For details about the data type restrictions, see Table 2. The scope is the L1OUT Buffer. Has format [Cout/16, Ho, Wo, 16], and size Cout * Ho * Wo, where Ho and Wo can be calculated as follows: Ho = floor((H + pad_top + pad_bottom – dilation_h * (Kh – 1) – 1) / stride_h + 1) Wo = floor((W + pad_left + pad_right – dilation_w * (Kw – 1) – 1) / stride_w + 1) The hardware requires Ho * Wo to be a multiple of 16. When defining the dst tensor, shape should be rounded up to the multiple of 16. The actual shape size should be Cout * round_howo: round_howo = ceil(Ho * Wo/16) * 16 The invalid data introduced due to round-up will be removed in the subsequent fixpipe operation. |
|
feature_map |
Input |
Input tensor. For the supported data types, see Table 2. The scope is the L1 Buffer. |
|
weight |
Input |
Convolution kernel (weight). For the supported data types, see Table 2. The scope is the L1 Buffer. |
|
fm_shape |
Input |
Shape of feature_map, in the format [C1, H, W, C0]. C1 * C0 indicates the number of input channels.
H is an immediate of type int, specifying the height. Must be in the range of [1, 4096]. W is an immediate of type int, specifying the width. Must be in the range of [1, 4096]. |
|
kernel_shape |
Input |
Shape of weight, in the format [C1, Kh, Kw, Cout, C0]. C1 * C0 indicates the number of input channels.
Cout is an immediate of type int specifying the number of convolution kernels. The value is a multiple of 16 in the range of [16, 4096]. Kh is an immediate of type int specifying the height of each convolution kernel. Must be in the range of [1, 255]. Kw is an immediate of type int specifying the width of each convolution kernel. Must be in the range of [1, 255]. |
|
stride |
Input |
Convolution stride, in the format of [stride_h, stride_w]. stride_h: an immediate of type int specifying the height stride. Must be in the range of [1, 63]. stride_w: an immediate of type int specifying the width stride. Must be in the range of [1, 63]. |
|
pad |
Input |
Padding factors, in the format of [pad_left, pad_right, pad_top, pad_bottom]. pad_left: an immediate of type int specifying the number of columns to be padded to the left of the feature_map. Must be in the range of [0, 255]. pad_right: an immediate of type int specifying the number of columns to be padded to the right of the feature_map. Must be in the range of [0, 255]. pad_top: an immediate of type int specifying the number of rows to be padded to the top of the feature_map. Must be in the range of [0, 255]. pad_bottom: an immediate of type int specifying the number of rows to be padded to the bottom of the feature_map. Must be in the range of [0, 255]. |
|
dilation |
Input |
Convolution dilation factors, in the format of [dilation_h, dilation_w] dilation_h: an immediate of type int specifying the height dilation factor. Must be in the range of [1, 255]. dilation_w: an immediate of type int specifying the width dilation factor. Must be in the range of [1, 255]. The width and height of the dilated convolution kernel is calculated as follows: dilation_w * (Kw – 1) + 1; dilation_h * (Kh – 1) + 1 |
|
pad_value |
Input |
Padding value, an immediate of type int or float. Defaults to 0.
|
|
init_l1out |
Input |
A bool specifying whether to initialize dst. Defaults to True.
|
Applicability
Restrictions
- Single-step debugging takes a long time, and is therefore not recommended.
- This instruction is mutually exclusive with Vector instructions.
- This instruction should be used in pair with the fixpipe instruction.
- This instruction does not support the scenario where W is equal to Kw and H is greater than Kh. This will produce unexpected results.
- For details about the alignment requirements of the operand address offset, see General Restrictions.
Returns
None
Example
- Example: feature_map:weight:dst of type int8:int8:int32
from tbe import tik tik_instance = tik.Tik() # Define the tensors. feature_map_gm = tik_instance.Tensor("int8", [1, 4, 4, 32], name='feature_map_gm', scope=tik.scope_gm) weight_gm = tik_instance.Tensor("int8", [1, 2, 2, 32, 32], name='weight_gm', scope=tik.scope_gm) dst_gm = tik_instance.Tensor("int32", [2, 9, 16], name='dst_gm', scope=tik.scope_gm) feature_map = tik_instance.Tensor("int8", [1, 4, 4, 32], name='feature_map', scope=tik.scope_cbuf) weight = tik_instance.Tensor("int8", [1, 2, 2, 32, 32], name='weight', scope=tik.scope_cbuf) # dst has shape [2, 16, 16], where cout = 32. cout_blocks = 2, ho = 3, wo = 3, howo = 9. Therefore, round_howo = 16. dst = tik_instance.Tensor("int32", [2, 16, 16], name='dst', scope=tik.scope_cbuf_out) # Move data from the Global Memory to the source operand tensor. tik_instance.data_move(feature_map, feature_map_gm, 0, 1, 16, 0, 0) tik_instance.data_move(weight, weight_gm, 0, 1, 128, 0, 0) # Perform convolution. tik_instance.conv2d(dst, feature_map, weight, [1, 4, 4, 32], [1, 2, 2, 32, 32], [1, 1], [0, 0, 0, 0], [1, 1], 0) # Move dst from L1OUT Buffer to the Global Memory by co-working with the fixpipe instruction. # cout_blocks = 2, cburst_num = 2, burst_len = howo * 16 * src_dtype_size/32 = 9 * 16 * 4/32 = 18 tik_instance.fixpipe(dst_gm, dst, 2, 18, 0, 0, extend_params=None) tik_instance.BuildCCE(kernel_name="conv2d", inputs=[feature_map_gm, weight_gm], outputs=[dst_gm])Result example:
Input: feature_map_gm: [[[[2, 4, 2, 3, 2, ..., 3, 3, 0]]]] weight_gm: [[[[[-3, -5, -4, ..., -2, -4, -2]]]]] Output: dst_gm: [[[-230, -11, -83, -103, -123, ..., -174, -255]]]
- Example: feature_map:weight:dst of type float16:float16:float32
from tbe import tik tik_instance = tik.Tik() # Define the tensors. feature_map_gm = tik_instance.Tensor("float16", [2, 4, 4, 16], name='feature_map_gm', scope=tik.scope_gm) weight_gm = tik_instance.Tensor("float16", [2, 2, 2, 16, 16], name='weight_gm', scope=tik.scope_gm) dst_gm = tik_instance.Tensor("float32", [1, 4, 16], name='dst_gm', scope=tik.scope_gm) feature_map = tik_instance.Tensor("float16", [2, 4, 4, 16], name='feature_map', scope=tik.scope_cbuf) weight = tik_instance.Tensor("float16", [2, 2, 2, 16, 16], name='weight', scope=tik.scope_cbuf) # dst has shape [1, 16, 16], where cout = 16, cout_blocks = 1, ho = 2, wo = 2, howo = 4. Therefore, round_howo = 16. dst = tik_instance.Tensor("float32", [1, 16, 16], name='dst', scope=tik.scope_cbuf_out) # Move data from the Global Memory to the source operand tensor. tik_instance.data_move(feature_map, feature_map_gm, 0, 1, 32, 0, 0) tik_instance.data_move(weight, weight_gm, 0, 1, 128, 0, 0) # Perform convolution. tik_instance.conv2d(dst, feature_map, weight, [2, 4, 4, 16], [2, 2, 2, 16, 16], [1, 1], [0, 0, 0, 0], [2, 2], 0) # Move dst from L1OUT Buffer to the Global Memory by co-working with the fixpipe instruction. # cout_blocks = 1, cburst_num = 1, burst_len = howo * 16 * src_dtype_size/32 = 4 * 16 * 4/32 = 8 tik_instance.fixpipe(dst_gm, dst, 1, 8, 0, 0, extend_params=None) tik_instance.BuildCCE(kernel_name="conv2d", inputs=[feature_map_gm, weight_gm], outputs=[dst_gm])Result example:
Input: feature_map_gm: [[[[0.0, 0.01, 0.02, 0.03, 0.04, ..., 5.09, 5.1, 5.11]]]] weight_gm: [[[[[0.0, 0.01, 0.02, 0.03, 0.04, ..., 20.46, 20.47]]]]] Output: dst_gm: [[[3568.7373, 3612.8433, 3657.0618, 3701.162 , 3745.287 , 3789.4834, 3833.6282, 3877.876 , 3921.9812, 3966.0745, 4010.311 , 4054.4119, 4098.5713, 4142.702 , 4186.8457, 4231.0312], [3753.9888, 3801.3733, 3848.8735, 3896.2534, 3943.6558, 3991.1353, 4038.5586, 4086.0913, 4133.4736, 4180.8457, 4228.3643, 4275.745 , 4323.1826, 4370.5947, 4418.016 , 4465.4844], [4309.196 , 4366.4077, 4423.745 , 4480.9565, 4538.1816, 4595.5054, 4652.755 , 4710.135 , 4767.34 , 4824.5405, 4881.897 , 4939.1104, 4996.374 , 5053.6226, 5110.871 , 5168.179 ], [4494.4526, 4554.944 , 4615.564 , 4676.0557, 4736.5586, 4797.166 , 4857.695 , 4918.3604, 4978.8433, 5039.323 , 5099.9624, 5160.456 , 5220.999 , 5281.5293, 5342.0566, 5402.6475]]]