conv2d

Description

Performs 2D convolution on an input tensor and a weight tensor and outputs a result tensor.

The following data types are supported (feature_map:weight:dst):

int8:int8:int32
float16:float16:float32

Prototype

conv2d(dst, feature_map, weight, fm_shape, kernel_shape, stride, pad, dilation, pad_value=0, init_l1out=True)

Parameters

**Table 1** Parameter description
Parameter	Input/Output	Description
dst	Output	Start element of the destination operand. For details about the data type restrictions, see Table 2. The scope is the L1OUT Buffer. Has format [Cout/16, Ho, Wo, 16], and size *Cout Ho * Wo*, where Ho and Wo can be calculated as follows: Ho = floor((H + pad_top + pad_bottom – dilation_h (Kh – 1) – 1) / stride_h + 1) Wo = floor((W + pad_left + pad_right – dilation_w * (Kw – 1) – 1) / stride_w + 1) The hardware requires *Ho Wo to be a multiple of 16. When defining the dst tensor, shape should be rounded up to the multiple of 16. The actual shape size should be Cout * round_howo*: round_howo = ceil(Ho Wo/16) * 16 The invalid data introduced due to round-up will be removed in the subsequent fixpipe operation.
feature_map	Input	Input tensor. For the supported data types, see Table 2. The scope is the L1 Buffer.
weight	Input	Convolution kernel (weight). For the supported data types, see Table 2. The scope is the L1 Buffer.
fm_shape	Input	Shape of feature_map, in the format [C1, H, W, C0]. *C1 C0 indicates the number of input channels. If feature_map is of type float16, C0 is 16. The data type is immediate (int). If feature_map is of type int8, C0 is 32. The data type is immediate (int). C1 is an immediate in the range of [1, 256]. The number of input channels is in the range of [16 or 32, 4096]. For the first convolution on the network with float16 or int8 input, fm_shape** of conv2d supports C0 = 4 and C1 = 1. H is an immediate of type int, specifying the height. Must be in the range of [1, 4096]. W is an immediate of type int, specifying the width. Must be in the range of [1, 4096].
kernel_shape	Input	Shape of weight, in the format [C1, Kh, Kw, Cout, C0]. *C1 C0 indicates the number of input channels. If feature_map is of type float16, C0 is 16. The data type is immediate (int). If feature_map is of type int8, C0 is 32. The data type is immediate (int). C1 is an immediate of type int in the range of [1, 256]. The number of input channels is in the range of [16 or 32, 4096]. For the first convolution on the network with float16 or int8 input, kernel_shape of conv2d supports C0 = 4 and C1 = 1. Has the same number of input channels as fm_shape. Cout** is an immediate of type int specifying the number of convolution kernels. The value is a multiple of 16 in the range of [16, 4096]. Kh is an immediate of type int specifying the height of each convolution kernel. Must be in the range of [1, 255]. Kw is an immediate of type int specifying the width of each convolution kernel. Must be in the range of [1, 255].
stride	Input	Convolution stride, in the format of [stride_h, stride_w]. stride_h: an immediate of type int specifying the height stride. Must be in the range of [1, 63]. stride_w: an immediate of type int specifying the width stride. Must be in the range of [1, 63].
pad	Input	Padding factors, in the format of [pad_left, pad_right, pad_top, pad_bottom]. pad_left: an immediate of type int specifying the number of columns to be padded to the left of the feature_map. Must be in the range of [0, 255]. pad_right: an immediate of type int specifying the number of columns to be padded to the right of the feature_map. Must be in the range of [0, 255]. pad_top: an immediate of type int specifying the number of rows to be padded to the top of the feature_map. Must be in the range of [0, 255]. pad_bottom: an immediate of type int specifying the number of rows to be padded to the bottom of the feature_map. Must be in the range of [0, 255].
dilation	Input	Convolution dilation factors, in the format of [dilation_h, dilation_w] dilation_h: an immediate of type int specifying the height dilation factor. Must be in the range of [1, 255]. dilation_w: an immediate of type int specifying the width dilation factor. Must be in the range of [1, 255]. The width and height of the dilated convolution kernel is calculated as follows: *dilation_w (Kw – 1) + 1; dilation_h * (Kh – 1) + 1**
pad_value	Input	Padding value, an immediate of type int or float. Defaults to 0. If feature_map is of type int8, pad_value is an immediate of type int in the range of [–128, +127]. If feature_map is of type float16, pad_value is in the range of [–65504, +65504].
init_l1out	Input	A bool specifying whether to initialize dst. Defaults to True. True: The dst initial matrix will be overwritten by the compute result. False: The dst initial matrix stores the previous conv2d result and will be accumulated with the new conv2d result.

**Table 2** Data type combination of feature_map, weight, and dst
feature_map.dtype	weight.dtype	dst.dtype
int8	int8	int32
float16	float16	float32

Applicability

Atlas 200/300/500 Inference Product

Atlas Training Series Product

Restrictions

Single-step debugging takes a long time, and is therefore not recommended.
This instruction is mutually exclusive with Vector instructions.
This instruction should be used in pair with the fixpipe instruction.
This instruction does not support the scenario where W is equal to Kw and H is greater than Kh. This will produce unexpected results.
For details about the alignment requirements of the operand address offset, see General Restrictions.

Returns

None

Example

Example: feature_map:weight:dst of type int8:int8:int32

from tbe import tik
tik_instance = tik.Tik()
# Define the tensors.
feature_map_gm = tik_instance.Tensor("int8", [1, 4, 4, 32], name='feature_map_gm', scope=tik.scope_gm)
weight_gm = tik_instance.Tensor("int8", [1, 2, 2, 32, 32], name='weight_gm', scope=tik.scope_gm)
dst_gm = tik_instance.Tensor("int32", [2, 9, 16], name='dst_gm', scope=tik.scope_gm)
feature_map = tik_instance.Tensor("int8", [1, 4, 4, 32], name='feature_map', scope=tik.scope_cbuf)
weight = tik_instance.Tensor("int8", [1, 2, 2, 32, 32], name='weight', scope=tik.scope_cbuf)
# dst has shape [2, 16, 16], where cout = 32. cout_blocks = 2, ho = 3, wo = 3, howo = 9. Therefore, round_howo = 16.
dst = tik_instance.Tensor("int32", [2, 16, 16], name='dst', scope=tik.scope_cbuf_out)
# Move data from the Global Memory to the source operand tensor.
tik_instance.data_move(feature_map, feature_map_gm, 0, 1, 16, 0, 0)
tik_instance.data_move(weight, weight_gm, 0, 1, 128, 0, 0)
# Perform convolution.
tik_instance.conv2d(dst, feature_map, weight, [1, 4, 4, 32], [1, 2, 2, 32, 32], [1, 1], [0, 0, 0, 0], [1, 1], 0)
# Move dst from L1OUT Buffer to the Global Memory by co-working with the fixpipe instruction.
# cout_blocks = 2, cburst_num = 2, burst_len = howo * 16 * src_dtype_size/32 = 9 * 16 * 4/32 = 18
tik_instance.fixpipe(dst_gm, dst, 2, 18, 0, 0, extend_params=None)
tik_instance.BuildCCE(kernel_name="conv2d", inputs=[feature_map_gm, weight_gm], outputs=[dst_gm])

Result example:

Input:
feature_map_gm:
[[[[2, 4, 2, 3, 2, ..., 3, 3, 0]]]]
weight_gm:
[[[[[-3, -5, -4, ..., -2, -4, -2]]]]]
Output:
dst_gm:
[[[-230,  -11,  -83, -103, -123, ..., -174, -255]]]

Example: feature_map:weight:dst of type float16:float16:float32

from tbe import tik
tik_instance = tik.Tik()
# Define the tensors.
feature_map_gm = tik_instance.Tensor("float16", [2, 4, 4, 16], name='feature_map_gm', scope=tik.scope_gm)
weight_gm = tik_instance.Tensor("float16", [2, 2, 2, 16, 16], name='weight_gm', scope=tik.scope_gm)
dst_gm = tik_instance.Tensor("float32", [1, 4, 16], name='dst_gm', scope=tik.scope_gm)
feature_map = tik_instance.Tensor("float16", [2, 4, 4, 16], name='feature_map', scope=tik.scope_cbuf)
weight = tik_instance.Tensor("float16", [2, 2, 2, 16, 16], name='weight', scope=tik.scope_cbuf)
# dst has shape [1, 16, 16], where cout = 16, cout_blocks = 1, ho = 2, wo = 2, howo = 4. Therefore, round_howo = 16.
dst = tik_instance.Tensor("float32", [1, 16, 16], name='dst', scope=tik.scope_cbuf_out)
# Move data from the Global Memory to the source operand tensor.
tik_instance.data_move(feature_map, feature_map_gm, 0, 1, 32, 0, 0)
tik_instance.data_move(weight, weight_gm, 0, 1, 128, 0, 0)
# Perform convolution.
tik_instance.conv2d(dst, feature_map, weight, [2, 4, 4, 16], [2, 2, 2, 16, 16], [1, 1], [0, 0, 0, 0], [2, 2], 0)
# Move dst from L1OUT Buffer to the Global Memory by co-working with the fixpipe instruction.
# cout_blocks = 1, cburst_num = 1, burst_len = howo * 16 * src_dtype_size/32 = 4 * 16 * 4/32 = 8
tik_instance.fixpipe(dst_gm, dst, 1, 8, 0, 0, extend_params=None)
tik_instance.BuildCCE(kernel_name="conv2d", inputs=[feature_map_gm, weight_gm], outputs=[dst_gm])

Result example:

Input:
feature_map_gm:
[[[[0.0, 0.01, 0.02, 0.03, 0.04, ..., 5.09, 5.1, 5.11]]]]
weight_gm:
[[[[[0.0, 0.01, 0.02, 0.03, 0.04, ..., 20.46, 20.47]]]]]

Output:
dst_gm:
[[[3568.7373, 3612.8433, 3657.0618, 3701.162 , 3745.287 ,
   3789.4834, 3833.6282, 3877.876 , 3921.9812, 3966.0745,
   4010.311 , 4054.4119, 4098.5713, 4142.702 , 4186.8457,
   4231.0312],
  [3753.9888, 3801.3733, 3848.8735, 3896.2534, 3943.6558,
   3991.1353, 4038.5586, 4086.0913, 4133.4736, 4180.8457,
   4228.3643, 4275.745 , 4323.1826, 4370.5947, 4418.016 ,
   4465.4844],
  [4309.196 , 4366.4077, 4423.745 , 4480.9565, 4538.1816,
   4595.5054, 4652.755 , 4710.135 , 4767.34  , 4824.5405,
   4881.897 , 4939.1104, 4996.374 , 5053.6226, 5110.871 ,
   5168.179 ],
  [4494.4526, 4554.944 , 4615.564 , 4676.0557, 4736.5586,
   4797.166 , 4857.695 , 4918.3604, 4978.8433, 5039.323 ,
   5099.9624, 5160.456 , 5220.999 , 5281.5293, 5342.0566,
   5402.6475]]]

Parent topic: Matrix Computation