Conv2D (Deprecated)

Product Support

Product	Supported
Atlas A3 training products/Atlas A3 inference products	x
Atlas A2 training products/Atlas A2 inference products	x
Atlas 200I/500 A2 inference products	x
Atlas inference product's AI Core	√
Atlas inference product's Vector Core	x
Atlas training products	√

Function

This API has been deprecated and will be removed in later versions. Do not use this API.

Performs 2D convolution on a given input tensor and a weight tensor and outputs a result tensor. The Conv2d convolution layer is mostly used for image recognition, and a filter is used to extract features in an image.

Prototype

template <typename T, typename U>
__aicore__ inline void Conv2D(const LocalTensor<T>& dst, const LocalTensor<U>& featureMap, const LocalTensor<U>& weight, Conv2dParams& conv2dParams, Conv2dTilling& tilling)

The tiling structure in the passed parameter needs to be obtained through the following tiling compute API:

template <typename T>
__aicore__ inline Conv2dTilling GetConv2dTiling(Conv2dParams& conv2dParams)

Parameters

Table 1 Parameters

Parameter

Input/Output

Description

dst

Output

Destination operand.

For Atlas training products, the supported TPosition is CO1 or CO2.

For the Atlas inference product's AI Core, the supported TPosition is CO1 or CO2.

Has format [Cout/16, Ho, Wo, 16], and size Cout * Ho * Wo, where Ho and Wo can be calculated as follows:

Ho = floor((H + pad_top + pad_bottom - dilation_h * (Kh - 1) - 1) / stride_h + 1)

Wo = floor((W + pad_left + pad_right - dilation_w * (Kw - 1) - 1) / stride_w + 1)

The hardware requires Ho * Wo to be a multiple of 16. When defining the dst tensor, shape should be rounded up to the multiple of 16. The actual shape size should be Cout * round_howo:

round_howo = ceil(Ho * Wo/16) * 16

featureMap

Input

Input tensor. The TPosition of the tensor is A1.

Shape of feature_map, in the format [C1, H, W, C0].

C1 * C0 indicates the number of input channels.

If feature_map is of type half, C0 is 16.
If feature_map is of type int8_t, C0 is 32.
Value range of C1: [1, 4]. Value range of input channel: [16, 32, 64, 128].

H indicates the height. Value range: [1, 40].

W indicates the width. Value range: [1, 40].

weight

Input

Convolution kernel (weight) tensor. The TPosition of the tensor is B1.

Shape of weight, in the format [C1, Kh, Kw, Cout, C0].

C1 * C0 indicates the number of input channels.

If feature_map is of type half, C0 is 16.
If feature_map is of type int8_t, C0 is 32.
Value range of C1: [1, 4].
Has the same number of input channels as fm_shape.

Cout indicates the number of convolution kernels. The value range is [16, 32, 64, 128], which must be a multiple of 16.

Kh indicates the height of the convolution kernel. Value range: [1, 5].

Kw indicates the width of the convolution kernel. Value range is [1, 5].

conv2dParams

Input

Status parameters such as the input matrix shape. The type is Conv2dParams. The specific definition of the structure is as follows:

struct Conv2dParams {
    uint32_t imgShape[CONV2D_IMG_SIZE];       // [H, W]
    uint32_t kernelShapeIn[CONV2D_KERNEL_SIZE]; // [Kh, Kw]
    uint32_t stride[CONV2D_STRIDE];          // [stride_h, stride_w]
    uint32_t cin;                            // cin = C0 * C1;
    uint32_t cout;
    uint32_t padList[CONV2D_PAD];       // [pad_left, pad_right, pad_top, pad_bottom]
    uint32_t dilation[CONV2D_DILATION]; // [dilation_h, dilation_w]
    uint32_t initY;
    uint32_t partialSum;
};

tilling

Input

Fractal control parameter. The type is Conv2dTilling. The specific definition of the structure is as follows:

struct Conv2dTilling {
    const uint32_t blockSize = 16; // # M block size is always 16
    LoopMode loopMode = LoopMode::MODE_NM;

    uint32_t c0Size = 32;
    uint32_t dTypeSize = 1;

    uint32_t strideH = 0;
    uint32_t strideW = 0;
    uint32_t dilationH = 0;
    uint32_t dilationW = 0;
    uint32_t hi = 0;
    uint32_t wi = 0;
    uint32_t ho = 0;
    uint32_t wo = 0;

    uint32_t height = 0;
    uint32_t width = 0;

    uint32_t howo = 0;

    uint32_t mNum = 0;
    uint32_t nNum = 0;
    uint32_t kNum = 0;

    uint32_t mBlockNum = 0;
    uint32_t kBlockNum = 0;
    uint32_t nBlockNum = 0;

    uint32_t roundM = 0;
    uint32_t roundN = 0;
    uint32_t roundK = 0;

    uint32_t mTileBlock = 0;
    uint32_t nTileBlock = 0;
    uint32_t kTileBlock = 0;

    uint32_t mIterNum = 0;
    uint32_t nIterNum = 0;
    uint32_t kIterNum = 0;

    uint32_t mTileNums = 0;

    bool mHasTail = false;
    bool nHasTail = false;
    bool kHasTail = false;

    uint32_t kTailBlock = 0;
    uint32_t mTailBlock = 0;
    uint32_t nTailBlock = 0;

    uint32_t mTailNums = 0;
};

**Table 2** Parameters in the Conv2DParams structure:
Parameter	Input/Output	Meaning
imgShape	vector<int>	Shape of feature_map, in the format [H, W]. H indicates the height. The value range is [1, 40]. W indicates the width. The value range is [1, 40].
kernelShape	vector<int>	Shape of weight, in the format [Kh, Kw]. Kh indicates the height. The value range is [1, 5]. Kw indicates the width. The value range is [1, 5].
stride	vector<int>	Convolution stride, in the format of [stride_h, stride_w]. stride_h: height stride, within the range [1, 4]. stride_w: width stride, within the range of [1, 4].
cin	int	Fractal layout parameter. Cin = C1 * C0. Cin indicates the number of input channels. The value range of C1 is [1, 4]. If feature_map is of type float, C0 = 8. The value range of the input channel is [8, 16, 24, 32]. If feature_map is of type half, C0 is 16. The value range of the input channel is [16, 32, 48, 64]. If feature_map is of type int8_t, C0 is 32. The value range of channel is [32, 64, 96, 128].
cout	int	Cout indicates the number of convolution kernels. The value range is [16, 32, 64, 128], which must be a multiple of 16.
padList	vector<int>	Padding factors, in the format of [pad_left, pad_right, pad_top, pad_bottom]. pad_left: number of columns to be padded to the left of feature_map. Must be in the range of [0, 4]. pad_right: number of columns to be padded to the right of the feature_map. Must be in the range of [0, 4]. pad_top: number of rows to be padded to the top of the feature_map. Must be in the range of [0, 4]. pad_bottom: number of rows to be padded to the bottom of the feature_map. Must be in the range of [0, 4].
dilation	vector<int>	Convolution dilation factors, in the format of [dilation_h, dilation_w] dilation_h: height dilation factor. Must be in the range of [1, 4]. dilation_w: width dilation factor. Must be in the range of [1, 4]. The width of the dilated convolution kernel is *dilation_w (Kw – 1) + 1, and the height of the dilated convolution kernel is dilation_h * (Kh – 1) + 1**
initY	uint32_t	Indicates whether dst needs to be initialized. 0: bias is not used. L0C needs to be initialized. The dst initial matrix stores the previous conv2d result and will be added up with the new conv2d result. 1: bias is not used. L0C does not need to be initialized. The dst initial matrix will be overwritten by the computation result.
partialSum	uint32_t	When TPosition where the dst parameter is located is set to CO2, this parameter is used to control whether the computation result is moved out. 0: move out computation result. 1: The computation result is not moved out but used for subsequent computation.

Table 3 Parameters in the Conv2dTilling structure

Parameter

Input/Output

Meaning

blockSize

uint32_t

Number of elements stored in a dimension. The value is fixed at 16.

loopMode

LoopMode

Traversal mode. The structure is defined as follows:

enum class LoopMode {
    MODE_NM = 0,
    MODE_MN = 1,
    MODE_KM = 2,
    MODE_KN = 3
};

c0Size

uint32_t

Length of a block. The value can be 16 or 32.

dtypeSize

uint32_t

Length of the input data, in bytes. The value range is [1, 2].

strideH

uint32_t

Height of the convolution stride. The value range is [1, 4].

strideW

uint32_t

Width of the convolution stride. The value range is [1, 4].

dilationH

uint32_t

Height of the convolution dilation factor. The value range is [1, 4].

dilationW

uint32_t

Width of the convolution dilation factor. The value range is [1, 4].

uint32_t

Height of the feature_map shape. The value range is [1, 40].

uint32_t

Width of the feature_map shape. The value range is [1, 40].

uint32_t

Height of the feature_map shape. The value range is [1, 40].

uint32_t

Width of the feature_map shape. The value range is [1, 40].

height

uint32_t

Height of the weight shape. The value range is [1, 5].

width

uint32_t

Width of the weight shape. The value range is [1, 5].

howo

uint32_t

Size of the feature_map shape (ho * wo)

mNum

uint32_t

Equivalent data length of the M axis. The value range is [1, 4096].

nNum

uint32_t

Equivalent data length of the N axis. The value range is [1, 4096].

kNum

uint32_t

Equivalent data length of the K axis. The value range is [1, 4096].

roundM

uint32_t

Equivalent data length of the M axis. The value is rounded up to an integer multiple of blockSize. The value range is [1, 4096].

roundN

uint32_t

Equivalent data length of the N axis. The value is rounded up to an integer multiple of blockSize. The value range is [1, 4096].

roundK

uint32_t

Equivalent data length of the K axis. The value is rounded up to an integer multiple of c0Size. The value range is [1, 4096].

mBlockNum

uint32_t

Number of blocks on the M axis. mBlockNum = mNum/blockSize. The value range is [1, 4096].

nBlockNum

uint32_t

Number of blocks on the N axis. nBlockNum = nNum/blockSize. The value range is [1, 4096].

kBlockNum

uint32_t

Number of blocks on the K axis. kBlockNum = kNum/blockSize. The value range is [1, 4096].

mIterNum

uint32_t

Number of dimensions traversed on the M axis. The value range is [1, 4096].

nIterNum

uint32_t

Number of dimensions traversed on the N axis. The value range is [1, 4096].

kIterNum

uint32_t

Number of dimensions traversed on the K axis. The value range is [1, 4096].

mTileBlock

uint32_t

Number of split blocks on the M axis. The value range is [1, 4096].

nTileBlock

uint32_t

Number of split blocks on the N axis. The value range is [1, 4096].

kTileBlock

uint32_t

Number of split blocks on the K axis. The value range is [1, 4096].

kTailBlock

uint32_t

Number of tail blocks on the K axis. The value range is [1, 4096].

mTailBlock

uint32_t

Number of tail blocks on the M axis. The value range is [1, 4096].

nTailBlock

uint32_t

Number of tail blocks on the N axis. The value range is [1, 4096].

kHasTail

bool

Indicates whether a tail block exists on the K axis.

mHasTail

bool

Indicates whether a tail block exists on the M axis.

nHasTail

bool

Indicates whether a tail block exists on the N axis.

mTileNums

uint32_t

Length of split blocks on the M axis. The value range is [1, 4096].

mTailNums

uint32_t

Length of tail blocks on the M axis. The value range is [1, 4096].

**Table 4** Data type combinations of **imgShape**, **kernelShape**, and **dst**
feature_map.dtype	weight.dtype	dst.dtype
int8_t	int8_t	int32_t
half	half	float
half	half	half

Restrictions

This instruction does not support the scenario where W is equal to Kw and H is greater than Kh. This will produce unexpected results.
For details about the operand address alignment requirements, see General Address Alignment Restrictions.

Parent topic: Cube Computation