Conv2D

Function Usage

This API has been deprecated and will be removed in later versions. Do not use this API.

Performs 2D convolution on a given input tensor and a weight tensor and outputs a result tensor. The Conv2d convolution layer is mostly used for image recognition, and a filter is used to extract features in an image.

Prototype

template <typename dst_T, typename src_T>
__aicore__ inline void Conv2D(const LocalTensor<dst_T>& dstLocal, const LocalTensor<src_T>& featureMap, const LocalTensor<src_T>& weight, Conv2dParams& conv2dParams, Conv2dTilling& tilling)

The tiling structure in the input parameter needs to be obtained through the following tiling compute API:

template <typename T>
__aicore__ inline Conv2dTilling GetConv2dTiling(Conv2dParams& conv2dParams)

Parameters

Table 1 Parameters

Parameter

Input/Output

Meaning

dstLocal

Output

Destination operand.

The Atlas Training Series Product supports QuePosition values CO1 and CO2.

Has format [Cout/16, Ho, Wo, 16], and size Cout * Ho * Wo, where Ho and Wo can be calculated as follows:

Ho = floor((H + pad_top + pad_bottom - dilation_h * (Kh - 1) - 1) / stride_h + 1)

Wo = floor((W + pad_left + pad_right - dilation_w * (Kw - 1) - 1) / stride_w + 1)

The hardware requires Ho * Wo to be a multiple of 16. When defining the dst tensor, shape should be rounded up to the multiple of 16. The actual shape size should be Cout * round_howo:

round_howo = ceil(Ho * Wo/16) * 16

featureMap

Input

Input tensor. The QuePosition of the tensor is A1.

Shape of feature_map, in the format [C1, H, W, C0].

C1 * C0 equals the input channel count.

If feature_map is of type half, C0 is 16.
If feature_map is of type int8_t, C0 is 32.
Value range of C1: [1, 4]. Value range of input channel: [16, 32, 64, 128].

H indicates the height. The value range is [1, 40].

W indicates the width. The value range is [1, 40].

weight

Input

Convolution kernel (weight) tensor. The QuePosition of the tensor is B1.

Shape of weight, in the format [C1, Kh, Kw, Cout, C0].

C1 * C0 indicates the number of input channels.

If feature_map is of type half, C0 is 16.
If feature_map is of type int8_t, C0 is 32.
Value range of C1: [1, 4].
Has the same number of input channels as fm_shape.

Cout indicates the number of filters. The value range is [16, 32, 64, 128], which must be a multiple of 16.

Kh indicates the height of the filter. The value range is [1, 5].

Kw indicates the width of the filter. The value range is [1, 5].

conv2dParams

Input

Status parameters such as the input matrix shape. The type is Conv2dParams. The specific definition of the structure is as follows:

struct Conv2dParams {
    uint32_t imgShape[kConv2dImgSize];       // [H, W]
    uint32_t kernelShape[kConv2dkernelSize]; // [Kh, Kw]
    uint32_t stride[kConv2dStride];          // [stride_h, stride_w]
    uint32_t cin;                            // cin = C0 * C1;
    uint32_t cout;
    uint32_t padList[kConv2dPad];       // [pad_left, pad_right, pad_top, pad_bottom]
    uint32_t dilation[kConv2dDilation]; // [dilation_h, dilation_w]
    uint32_t initY;
    uint32_t partialSum;
};

tilling

Input

Fractal control parameter. The type is Conv2dTilling. The specific definition of the structure is as follows:

struct Conv2dTilling {
    const uint32_t blockSize = 16; // # M block size is always 16
    LoopMode loopMode = LoopMode::MODE_NM;

    uint32_t c0Size = 32;
    uint32_t dTypeSize = 1;

    uint32_t strideH = 0;
    uint32_t strideW = 0;
    uint32_t dilationH = 0;
    uint32_t dilationW = 0;
    uint32_t hi = 0;
    uint32_t wi = 0;
    uint32_t ho = 0;
    uint32_t wo = 0;

    uint32_t height = 0;
    uint32_t width = 0;

    uint32_t howo = 0;

    uint32_t mNum = 0;
    uint32_t nNum = 0;
    uint32_t kNum = 0;

    uint32_t mBlockNum = 0;
    uint32_t kBlockNum = 0;
    uint32_t nBlockNum = 0;

    uint32_t roundM = 0;
    uint32_t roundN = 0;
    uint32_t roundK = 0;

    uint32_t mTileBlock = 0;
    uint32_t nTileBlock = 0;
    uint32_t kTileBlock = 0;

    uint32_t mIterNum = 0;
    uint32_t nIterNum = 0;
    uint32_t kIterNum = 0;

    uint32_t mTileNums = 0;

    bool mHasTail = false;
    bool nHasTail = false;
    bool kHasTail = false;

    uint32_t kTailBlock = 0;
    uint32_t mTailBlock = 0;
    uint32_t nTailBlock = 0;

    uint32_t mTailNums = 0;
};

**Table 2** Parameters in the Conv2DParams structure:
Parameter	Input/Output	Meaning
imgShape	vector<int>	Shape of feature_map, in the format [H, W]. H indicates the height. The value range is [1, 40]. W indicates the width. The value range is [1, 40].
kernelShape	vector<int>	Shape of weight, in the format [Kh, Kw]. Kh indicates the height. The value range is [1, 5]. Kw indicates the width. The value range is [1, 5].
stride	vector<int>	Convolution stride, in the format of [stride_h, stride_w]. stride_h: height stride, within the range [1, 4]. stride_w: width stride, within the range of [1, 4].
cin	int	Fractal layout parameter. Cin = C1 * C0. Cin indicates the number of input channels. The value range of C1 is [1, 4]. If feature_map is of type float, C0 = 8. The value range of the input channel is [8, 16, 24, 32]. If feature_map is of type half, C0 is 16. The value range of the input channel is [16, 32, 48, 64]. If feature_map is of type int8_t, C0 is 32. The value range of channel is [32, 64, 96, 128].
cout	int	Cout indicates the number of filters. The value range is [16, 32, 64, 128], which must be a multiple of 16.
padList	vector<int>	Padding factors, in the format of [pad_left, pad_right, pad_top, pad_bottom]. pad_left: number of columns to be padded to the left of feature_map. Must be in the range of [0, 4]. pad_right: number of columns to be padded to the right of the feature_map. Must be in the range of [0, 4]. pad_top: number of rows to be padded to the top of the feature_map. Must be in the range of [0, 4]. pad_bottom: number of rows to be padded to the bottom of the feature_map. Must be in the range of [0, 4].
dilation	vector<int>	Convolution dilation factors, in the format of [dilation_h, dilation_w] dilation_h: height dilation factor. Must be in the range of [1, 4]. dilation_w: width dilation factor. Must be in the range of [1, 4]. The width and height of the dilated convolution kernel is calculated as follows: *dilation_w (Kw – 1) + 1; dilation_h * (Kh – 1) + 1**
initY	uint32_t	Indicates whether dstLocal needs to be initialized. 0: Bias is not used. L0C needs to be initialized. The dstLocal initial matrix stores the previous conv2d result and will be added up with the new conv2d result. 1: Bias is not used. L0C does not need to be initialized. The dstLocal initial matrix will be overwritten by the compute result.
partialSum	uint32_t	When QuePosition where the dstLocal parameter is located is set to CO2, this parameter is used to control whether the computation result is moved out. 0: move out computation result. 1: The computation result is not moved out but used for subsequent computation.

Table 3 Parameters in the Conv2dTilling structure

Parameter

Input/Output

Meaning

blockSize

uint32_t

Number of elements stored in a dimension. The value is fixed at 16.

loopMode

LoopMode

Traversal mode. The structure is defined as follows:

enum class LoopMode {
    MODE_NM = 0,
    MODE_MN = 1,
    MODE_KM = 2,
    MODE_KN = 3
};

c0Size

uint32_t

Length of a block. The value can be 16 or 32.

dtypeSize

uint32_t

Length of the input data, in bytes. The value range is [1, 2].

strideH

uint32_t

Height of the convolution stride. The value range is [1, 4].

strideW

uint32_t

Width of the convolution stride. The value range is [1, 4].

dilationH

uint32_t

Height of the convolution dilation factor. The value range is [1, 4].

dilationW

uint32_t

Width of the convolution dilation factor. The value range is [1, 4].

uint32_t

Height of the feature_map shape. The value range is [1, 40].

uint32_t

Width of the feature_map shape. The value range is [1, 40].

uint32_t

Height of the feature_map shape. The value range is [1, 40].

uint32_t

Width of the feature_map shape. The value range is [1, 40].

height

uint32_t

Height of the weight shape. The value range is [1, 5].

width

uint32_t

Width of the weight shape. The value range is [1, 5].

howo

uint32_t

Size of the feature_map shape (ho * wo)

mNum

uint32_t

Equivalent data length of the M axis. The value range is [1, 4096].

nNum

uint32_t

Equivalent data length of the N axis. The value range is [1, 4096].

kNum

uint32_t

Equivalent data length of the K axis. The value range is [1, 4096].

roundM

uint32_t

Equivalent data length of the M axis. The value is rounded up to an integer multiple of blockSize. The value range is [1, 4096].

roundN

uint32_t

Equivalent data length of the N axis. The value is rounded up to an integer multiple of blockSize. The value range is [1, 4096].

roundK

uint32_t

Equivalent data length of the K axis. The value is rounded up to an integer multiple of c0Size. The value range is [1, 4096].

mBlockNum

uint32_t

Number of blocks on the M axis. mBlockNum = mNum/blockSize. The value range is [1, 4096].

nBlockNum

uint32_t

Number of blocks on the N axis. nBlockNum = nNum/blockSize. The value range is [1, 4096].

kBlockNum

uint32_t

Number of blocks on the K axis. kBlockNum = kNum/blockSize. The value range is [1, 4096].

mIterNum

uint32_t

Number of dimensions traversed on the M axis. The value range is [1, 4096].

nIterNum

uint32_t

Number of dimensions traversed on the N axis. The value range is [1, 4096].

kIterNum

uint32_t

Number of dimensions traversed on the K axis. The value range is [1, 4096].

mTileBlock

uint32_t

Number of split blocks on the M axis. The value range is [1, 4096].

nTileBlock

uint32_t

Number of split blocks on the N axis. The value range is [1, 4096].

kTileBlock

uint32_t

Number of split blocks on the K axis. The value range is [1, 4096].

kTailBlock

uint32_t

Number of tail blocks on the K axis. The value range is [1, 4096].

mTailBlock

uint32_t

Number of tail blocks on the M axis. The value range is [1, 4096].

nTailBlock

uint32_t

Number of tail blocks on the N axis. The value range is [1, 4096].

kHasTail

bool

Indicates whether a tail block exists on the K axis.

mHasTail

bool

Indicates whether a tail block exists on the M axis.

nHasTail

bool

Indicates whether a tail block exists on the N axis.

mTileNums

uint32_t

Length of split blocks on the M axis. The value range is [1, 4096].

mTailNums

uint32_t

Length of tail blocks on the M axis. The value range is [1, 4096].

**Table 4** Data type of imgShape, kernelShape, and dstLocal
feature_map.dtype	weight.dtype	dst.dtype
int8_t	int8_t	int32_t
half	half	float
half	half	half

Availability

Atlas Training Series Product

Precautions

This instruction does not support the scenario where W is equal to Kw and H is greater than Kh. This will produce unexpected results.
For details about the alignment requirements of the operand address offset, see General Restrictions.

Parent topic: Matrix Computation (ISASI)