Conv2D
Function Usage
This API has been deprecated and will be removed in later versions. Do not use this API.
Performs 2D convolution on a given input tensor and a weight tensor and outputs a result tensor. The Conv2d convolution layer is mostly used for image recognition, and a filter is used to extract features in an image.
Prototype
1 2 | template <typename dst_T, typename src_T> __aicore__ inline void Conv2D(const LocalTensor<dst_T>& dstLocal, const LocalTensor<src_T>& featureMap, const LocalTensor<src_T>& weight, Conv2dParams& conv2dParams, Conv2dTilling& tilling) |
1 2 | template <typename T> __aicore__ inline Conv2dTilling GetConv2dTiling(Conv2dParams& conv2dParams) |
Parameters
Parameter |
Input/Output |
Meaning |
||
|---|---|---|---|---|
dstLocal |
Output |
Destination operand. The Has format [Cout/16, Ho, Wo, 16], and size Cout * Ho * Wo, where Ho and Wo can be calculated as follows: Ho = floor((H + pad_top + pad_bottom - dilation_h * (Kh - 1) - 1) / stride_h + 1) Wo = floor((W + pad_left + pad_right - dilation_w * (Kw - 1) - 1) / stride_w + 1) The hardware requires Ho * Wo to be a multiple of 16. When defining the dst tensor, shape should be rounded up to the multiple of 16. The actual shape size should be Cout * round_howo: round_howo = ceil(Ho * Wo/16) * 16 |
||
featureMap |
Input |
Input tensor. The QuePosition of the tensor is A1. Shape of feature_map, in the format [C1, H, W, C0]. C1 * C0 equals the input channel count.
H indicates the height. The value range is [1, 40]. W indicates the width. The value range is [1, 40]. |
||
weight |
Input |
Convolution kernel (weight) tensor. The QuePosition of the tensor is B1. Shape of weight, in the format [C1, Kh, Kw, Cout, C0]. C1 * C0 indicates the number of input channels.
Cout indicates the number of filters. The value range is [16, 32, 64, 128], which must be a multiple of 16. Kh indicates the height of the filter. The value range is [1, 5]. Kw indicates the width of the filter. The value range is [1, 5]. |
||
conv2dParams |
Input |
Status parameters such as the input matrix shape. The type is Conv2dParams. The specific definition of the structure is as follows:
|
||
tilling |
Input |
Fractal control parameter. The type is Conv2dTilling. The specific definition of the structure is as follows:
|
Parameter |
Input/Output |
Meaning |
|---|---|---|
imgShape |
vector<int> |
Shape of feature_map, in the format [H, W].
|
kernelShape |
vector<int> |
Shape of weight, in the format [Kh, Kw].
|
stride |
vector<int> |
Convolution stride, in the format of [stride_h, stride_w].
|
cin |
int |
Fractal layout parameter. Cin = C1 * C0. Cin indicates the number of input channels. The value range of C1 is [1, 4].
|
cout |
int |
Cout indicates the number of filters. The value range is [16, 32, 64, 128], which must be a multiple of 16. |
padList |
vector<int> |
Padding factors, in the format of [pad_left, pad_right, pad_top, pad_bottom].
|
dilation |
vector<int> |
Convolution dilation factors, in the format of [dilation_h, dilation_w]
The width and height of the dilated convolution kernel is calculated as follows: dilation_w * (Kw – 1) + 1; dilation_h * (Kh – 1) + 1 |
initY |
uint32_t |
Indicates whether dstLocal needs to be initialized.
|
partialSum |
uint32_t |
When QuePosition where the dstLocal parameter is located is set to CO2, this parameter is used to control whether the computation result is moved out.
|
Parameter |
Input/Output |
Meaning |
||
|---|---|---|---|---|
blockSize |
uint32_t |
Number of elements stored in a dimension. The value is fixed at 16. |
||
loopMode |
LoopMode |
Traversal mode. The structure is defined as follows:
|
||
c0Size |
uint32_t |
Length of a block. The value can be 16 or 32. |
||
dtypeSize |
uint32_t |
Length of the input data, in bytes. The value range is [1, 2]. |
||
strideH |
uint32_t |
Height of the convolution stride. The value range is [1, 4]. |
||
strideW |
uint32_t |
Width of the convolution stride. The value range is [1, 4]. |
||
dilationH |
uint32_t |
Height of the convolution dilation factor. The value range is [1, 4]. |
||
dilationW |
uint32_t |
Width of the convolution dilation factor. The value range is [1, 4]. |
||
hi |
uint32_t |
Height of the feature_map shape. The value range is [1, 40]. |
||
wi |
uint32_t |
Width of the feature_map shape. The value range is [1, 40]. |
||
ho |
uint32_t |
Height of the feature_map shape. The value range is [1, 40]. |
||
wo |
uint32_t |
Width of the feature_map shape. The value range is [1, 40]. |
||
height |
uint32_t |
Height of the weight shape. The value range is [1, 5]. |
||
width |
uint32_t |
Width of the weight shape. The value range is [1, 5]. |
||
howo |
uint32_t |
Size of the feature_map shape (ho * wo) |
||
mNum |
uint32_t |
Equivalent data length of the M axis. The value range is [1, 4096]. |
||
nNum |
uint32_t |
Equivalent data length of the N axis. The value range is [1, 4096]. |
||
kNum |
uint32_t |
Equivalent data length of the K axis. The value range is [1, 4096]. |
||
roundM |
uint32_t |
Equivalent data length of the M axis. The value is rounded up to an integer multiple of blockSize. The value range is [1, 4096]. |
||
roundN |
uint32_t |
Equivalent data length of the N axis. The value is rounded up to an integer multiple of blockSize. The value range is [1, 4096]. |
||
roundK |
uint32_t |
Equivalent data length of the K axis. The value is rounded up to an integer multiple of c0Size. The value range is [1, 4096]. |
||
mBlockNum |
uint32_t |
Number of blocks on the M axis. mBlockNum = mNum/blockSize. The value range is [1, 4096]. |
||
nBlockNum |
uint32_t |
Number of blocks on the N axis. nBlockNum = nNum/blockSize. The value range is [1, 4096]. |
||
kBlockNum |
uint32_t |
Number of blocks on the K axis. kBlockNum = kNum/blockSize. The value range is [1, 4096]. |
||
mIterNum |
uint32_t |
Number of dimensions traversed on the M axis. The value range is [1, 4096]. |
||
nIterNum |
uint32_t |
Number of dimensions traversed on the N axis. The value range is [1, 4096]. |
||
kIterNum |
uint32_t |
Number of dimensions traversed on the K axis. The value range is [1, 4096]. |
||
mTileBlock |
uint32_t |
Number of split blocks on the M axis. The value range is [1, 4096]. |
||
nTileBlock |
uint32_t |
Number of split blocks on the N axis. The value range is [1, 4096]. |
||
kTileBlock |
uint32_t |
Number of split blocks on the K axis. The value range is [1, 4096]. |
||
kTailBlock |
uint32_t |
Number of tail blocks on the K axis. The value range is [1, 4096]. |
||
mTailBlock |
uint32_t |
Number of tail blocks on the M axis. The value range is [1, 4096]. |
||
nTailBlock |
uint32_t |
Number of tail blocks on the N axis. The value range is [1, 4096]. |
||
kHasTail |
bool |
Indicates whether a tail block exists on the K axis. |
||
mHasTail |
bool |
Indicates whether a tail block exists on the M axis. |
||
nHasTail |
bool |
Indicates whether a tail block exists on the N axis. |
||
mTileNums |
uint32_t |
Length of split blocks on the M axis. The value range is [1, 4096]. |
||
mTailNums |
uint32_t |
Length of tail blocks on the M axis. The value range is [1, 4096]. |
feature_map.dtype |
weight.dtype |
dst.dtype |
|---|---|---|
int8_t |
int8_t |
int32_t |
half |
half |
float |
half |
half |
half |
Availability
Precautions
- This instruction does not support the scenario where W is equal to Kw and H is greater than Kh. This will produce unexpected results.
- For details about the alignment requirements of the operand address offset, see General Restrictions.