conv2d

Description

Performs 2D convolution on an input tensor and a weight tensor and outputs a result tensor.

The following data types are supported (feature_map:weight:dst):

  • int8:int8:int32
  • float16:float16:float32

Prototype

conv2d(dst, feature_map, weight, fm_shape, kernel_shape, stride, pad, dilation, pad_value=0, init_l1out=True)

Parameters

Table 1 Parameter description

Parameter

Input/Output

Description

dst

Output

Start element of the destination operand. For details about the data type restrictions, see Table 2. The scope is the L1OUT Buffer.

Has format [Cout/16, Ho, Wo, 16], and size Cout * Ho * Wo, where Ho and Wo can be calculated as follows:

Ho = floor((H + pad_top + pad_bottom – dilation_h * (Kh – 1) – 1) / stride_h + 1)

Wo = floor((W + pad_left + pad_right – dilation_w * (Kw – 1) – 1) / stride_w + 1)

The hardware requires Ho * Wo to be a multiple of 16. When defining the dst tensor, shape should be rounded up to the multiple of 16. The actual shape size should be Cout * round_howo:

round_howo = ceil(Ho * Wo/16) * 16

The invalid data introduced due to round-up will be removed in the subsequent fixpipe operation.

feature_map

Input

Input tensor. For the supported data types, see Table 2. The scope is the L1 Buffer.

weight

Input

Convolution kernel (weight). For the supported data types, see Table 2. The scope is the L1 Buffer.

fm_shape

Input

Shape of feature_map, in the format [C1, H, W, C0].

C1 * C0 indicates the number of input channels.

  • If feature_map is of type float16, C0 is 16. The data type is immediate (int).
  • If feature_map is of type int8, C0 is 32. The data type is immediate (int).
  • C1 is an immediate in the range of [1, 256]. The number of input channels is in the range of [16 or 32, 4096].
  • For the first convolution on the network with float16 or int8 input, fm_shape of conv2d supports C0 = 4 and C1 = 1.

H is an immediate of type int, specifying the height. Must be in the range of [1, 4096].

W is an immediate of type int, specifying the width. Must be in the range of [1, 4096].

kernel_shape

Input

Shape of weight, in the format [C1, Kh, Kw, Cout, C0].

C1 * C0 indicates the number of input channels.

  • If feature_map is of type float16, C0 is 16. The data type is immediate (int).
  • If feature_map is of type int8, C0 is 32. The data type is immediate (int).
  • C1 is an immediate of type int in the range of [1, 256]. The number of input channels is in the range of [16 or 32, 4096].
  • For the first convolution on the network with float16 or int8 input, kernel_shape of conv2d supports C0 = 4 and C1 = 1.
  • Has the same number of input channels as fm_shape.

Cout is an immediate of type int specifying the number of convolution kernels. The value is a multiple of 16 in the range of [16, 4096].

Kh is an immediate of type int specifying the height of each convolution kernel. Must be in the range of [1, 255].

Kw is an immediate of type int specifying the width of each convolution kernel. Must be in the range of [1, 255].

stride

Input

Convolution stride, in the format of [stride_h, stride_w].

stride_h: an immediate of type int specifying the height stride. Must be in the range of [1, 63].

stride_w: an immediate of type int specifying the width stride. Must be in the range of [1, 63].

pad

Input

Padding factors, in the format of [pad_left, pad_right, pad_top, pad_bottom].

pad_left: an immediate of type int specifying the number of columns to be padded to the left of the feature_map. Must be in the range of [0, 255].

pad_right: an immediate of type int specifying the number of columns to be padded to the right of the feature_map. Must be in the range of [0, 255].

pad_top: an immediate of type int specifying the number of rows to be padded to the top of the feature_map. Must be in the range of [0, 255].

pad_bottom: an immediate of type int specifying the number of rows to be padded to the bottom of the feature_map. Must be in the range of [0, 255].

dilation

Input

Convolution dilation factors, in the format of [dilation_h, dilation_w]

dilation_h: an immediate of type int specifying the height dilation factor. Must be in the range of [1, 255].

dilation_w: an immediate of type int specifying the width dilation factor. Must be in the range of [1, 255].

The width and height of the dilated convolution kernel is calculated as follows: dilation_w * (Kw – 1) + 1; dilation_h * (Kh – 1) + 1

pad_value

Input

Padding value, an immediate of type int or float. Defaults to 0.

  • If feature_map is of type int8, pad_value is an immediate of type int in the range of [–128, +127].
  • If feature_map is of type float16, pad_value is in the range of [–65504, +65504].

init_l1out

Input

A bool specifying whether to initialize dst. Defaults to True.

  • True: The dst initial matrix will be overwritten by the compute result.
  • False: The dst initial matrix stores the previous conv2d result and will be accumulated with the new conv2d result.
Table 2 Data type combination of feature_map, weight, and dst

feature_map.dtype

weight.dtype

dst.dtype

int8

int8

int32

float16

float16

float32

Applicability

Atlas 200/300/500 Inference Product

Atlas Training Series Product

Restrictions

  • Single-step debugging takes a long time, and is therefore not recommended.
  • This instruction is mutually exclusive with Vector instructions.
  • This instruction should be used in pair with the fixpipe instruction.
  • This instruction does not support the scenario where W is equal to Kw and H is greater than Kh. This will produce unexpected results.
  • For details about the alignment requirements of the operand address offset, see General Restrictions.

Returns

None

Example

  • Example: feature_map:weight:dst of type int8:int8:int32
    from tbe import tik
    tik_instance = tik.Tik()
    # Define the tensors.
    feature_map_gm = tik_instance.Tensor("int8", [1, 4, 4, 32], name='feature_map_gm', scope=tik.scope_gm)
    weight_gm = tik_instance.Tensor("int8", [1, 2, 2, 32, 32], name='weight_gm', scope=tik.scope_gm)
    dst_gm = tik_instance.Tensor("int32", [2, 9, 16], name='dst_gm', scope=tik.scope_gm)
    feature_map = tik_instance.Tensor("int8", [1, 4, 4, 32], name='feature_map', scope=tik.scope_cbuf)
    weight = tik_instance.Tensor("int8", [1, 2, 2, 32, 32], name='weight', scope=tik.scope_cbuf)
    # dst has shape [2, 16, 16], where cout = 32. cout_blocks = 2, ho = 3, wo = 3, howo = 9. Therefore, round_howo = 16.
    dst = tik_instance.Tensor("int32", [2, 16, 16], name='dst', scope=tik.scope_cbuf_out)
    # Move data from the Global Memory to the source operand tensor.
    tik_instance.data_move(feature_map, feature_map_gm, 0, 1, 16, 0, 0)
    tik_instance.data_move(weight, weight_gm, 0, 1, 128, 0, 0)
    # Perform convolution.
    tik_instance.conv2d(dst, feature_map, weight, [1, 4, 4, 32], [1, 2, 2, 32, 32], [1, 1], [0, 0, 0, 0], [1, 1], 0)
    # Move dst from L1OUT Buffer to the Global Memory by co-working with the fixpipe instruction.
    # cout_blocks = 2, cburst_num = 2, burst_len = howo * 16 * src_dtype_size/32 = 9 * 16 * 4/32 = 18
    tik_instance.fixpipe(dst_gm, dst, 2, 18, 0, 0, extend_params=None)
    tik_instance.BuildCCE(kernel_name="conv2d", inputs=[feature_map_gm, weight_gm], outputs=[dst_gm])

    Result example:

    Input:
    feature_map_gm:
    [[[[2, 4, 2, 3, 2, ..., 3, 3, 0]]]]
    weight_gm:
    [[[[[-3, -5, -4, ..., -2, -4, -2]]]]]
    Output:
    dst_gm:
    [[[-230,  -11,  -83, -103, -123, ..., -174, -255]]]
  • Example: feature_map:weight:dst of type float16:float16:float32
    from tbe import tik
    tik_instance = tik.Tik()
    # Define the tensors.
    feature_map_gm = tik_instance.Tensor("float16", [2, 4, 4, 16], name='feature_map_gm', scope=tik.scope_gm)
    weight_gm = tik_instance.Tensor("float16", [2, 2, 2, 16, 16], name='weight_gm', scope=tik.scope_gm)
    dst_gm = tik_instance.Tensor("float32", [1, 4, 16], name='dst_gm', scope=tik.scope_gm)
    feature_map = tik_instance.Tensor("float16", [2, 4, 4, 16], name='feature_map', scope=tik.scope_cbuf)
    weight = tik_instance.Tensor("float16", [2, 2, 2, 16, 16], name='weight', scope=tik.scope_cbuf)
    # dst has shape [1, 16, 16], where cout = 16, cout_blocks = 1, ho = 2, wo = 2, howo = 4. Therefore, round_howo = 16.
    dst = tik_instance.Tensor("float32", [1, 16, 16], name='dst', scope=tik.scope_cbuf_out)
    # Move data from the Global Memory to the source operand tensor.
    tik_instance.data_move(feature_map, feature_map_gm, 0, 1, 32, 0, 0)
    tik_instance.data_move(weight, weight_gm, 0, 1, 128, 0, 0)
    # Perform convolution.
    tik_instance.conv2d(dst, feature_map, weight, [2, 4, 4, 16], [2, 2, 2, 16, 16], [1, 1], [0, 0, 0, 0], [2, 2], 0)
    # Move dst from L1OUT Buffer to the Global Memory by co-working with the fixpipe instruction.
    # cout_blocks = 1, cburst_num = 1, burst_len = howo * 16 * src_dtype_size/32 = 4 * 16 * 4/32 = 8
    tik_instance.fixpipe(dst_gm, dst, 1, 8, 0, 0, extend_params=None)
    tik_instance.BuildCCE(kernel_name="conv2d", inputs=[feature_map_gm, weight_gm], outputs=[dst_gm])

    Result example:

    Input:
    feature_map_gm:
    [[[[0.0, 0.01, 0.02, 0.03, 0.04, ..., 5.09, 5.1, 5.11]]]]
    weight_gm:
    [[[[[0.0, 0.01, 0.02, 0.03, 0.04, ..., 20.46, 20.47]]]]]
    
    Output:
    dst_gm:
    [[[3568.7373, 3612.8433, 3657.0618, 3701.162 , 3745.287 ,
       3789.4834, 3833.6282, 3877.876 , 3921.9812, 3966.0745,
       4010.311 , 4054.4119, 4098.5713, 4142.702 , 4186.8457,
       4231.0312],
      [3753.9888, 3801.3733, 3848.8735, 3896.2534, 3943.6558,
       3991.1353, 4038.5586, 4086.0913, 4133.4736, 4180.8457,
       4228.3643, 4275.745 , 4323.1826, 4370.5947, 4418.016 ,
       4465.4844],
      [4309.196 , 4366.4077, 4423.745 , 4480.9565, 4538.1816,
       4595.5054, 4652.755 , 4710.135 , 4767.34  , 4824.5405,
       4881.897 , 4939.1104, 4996.374 , 5053.6226, 5110.871 ,
       5168.179 ],
      [4494.4526, 4554.944 , 4615.564 , 4676.0557, 4736.5586,
       4797.166 , 4857.695 , 4918.3604, 4978.8433, 5039.323 ,
       5099.9624, 5160.456 , 5220.999 , 5281.5293, 5342.0566,
       5402.6475]]]