Conv2D

Function Usage

This API has been deprecated and will be removed in later versions. Do not use this API.

Performs 2D convolution on a given input tensor and a weight tensor and outputs a result tensor. The Conv2d convolution layer is mostly used for image recognition, and a filter is used to extract features in an image.

Prototype

1
2
template <typename dst_T, typename src_T>
__aicore__ inline void Conv2D(const LocalTensor<dst_T>& dstLocal, const LocalTensor<src_T>& featureMap, const LocalTensor<src_T>& weight, Conv2dParams& conv2dParams, Conv2dTilling& tilling)
The tiling structure in the input parameter needs to be obtained through the following tiling compute API:
1
2
template <typename T>
__aicore__ inline Conv2dTilling GetConv2dTiling(Conv2dParams& conv2dParams)

Parameters

Table 1 Parameters

Parameter

Input/Output

Meaning

dstLocal

Output

Destination operand.

The Atlas Training Series Product supports QuePosition values CO1 and CO2.

Has format [Cout/16, Ho, Wo, 16], and size Cout * Ho * Wo, where Ho and Wo can be calculated as follows:

Ho = floor((H + pad_top + pad_bottom - dilation_h * (Kh - 1) - 1) / stride_h + 1)

Wo = floor((W + pad_left + pad_right - dilation_w * (Kw - 1) - 1) / stride_w + 1)

The hardware requires Ho * Wo to be a multiple of 16. When defining the dst tensor, shape should be rounded up to the multiple of 16. The actual shape size should be Cout * round_howo:

round_howo = ceil(Ho * Wo/16) * 16

featureMap

Input

Input tensor. The QuePosition of the tensor is A1.

Shape of feature_map, in the format [C1, H, W, C0].

C1 * C0 equals the input channel count.

  • If feature_map is of type half, C0 is 16.
  • If feature_map is of type int8_t, C0 is 32.
  • Value range of C1: [1, 4]. Value range of input channel: [16, 32, 64, 128].

H indicates the height. The value range is [1, 40].

W indicates the width. The value range is [1, 40].

weight

Input

Convolution kernel (weight) tensor. The QuePosition of the tensor is B1.

Shape of weight, in the format [C1, Kh, Kw, Cout, C0].

C1 * C0 indicates the number of input channels.

  • If feature_map is of type half, C0 is 16.
  • If feature_map is of type int8_t, C0 is 32.
  • Value range of C1: [1, 4].
  • Has the same number of input channels as fm_shape.

Cout indicates the number of filters. The value range is [16, 32, 64, 128], which must be a multiple of 16.

Kh indicates the height of the filter. The value range is [1, 5].

Kw indicates the width of the filter. The value range is [1, 5].

conv2dParams

Input

Status parameters such as the input matrix shape. The type is Conv2dParams. The specific definition of the structure is as follows:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
struct Conv2dParams {
    uint32_t imgShape[kConv2dImgSize];       // [H, W]
    uint32_t kernelShape[kConv2dkernelSize]; // [Kh, Kw]
    uint32_t stride[kConv2dStride];          // [stride_h, stride_w]
    uint32_t cin;                            // cin = C0 * C1;
    uint32_t cout;
    uint32_t padList[kConv2dPad];       // [pad_left, pad_right, pad_top, pad_bottom]
    uint32_t dilation[kConv2dDilation]; // [dilation_h, dilation_w]
    uint32_t initY;
    uint32_t partialSum;
};

tilling

Input

Fractal control parameter. The type is Conv2dTilling. The specific definition of the structure is as follows:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
struct Conv2dTilling {
    const uint32_t blockSize = 16; // # M block size is always 16
    LoopMode loopMode = LoopMode::MODE_NM;

    uint32_t c0Size = 32;
    uint32_t dTypeSize = 1;

    uint32_t strideH = 0;
    uint32_t strideW = 0;
    uint32_t dilationH = 0;
    uint32_t dilationW = 0;
    uint32_t hi = 0;
    uint32_t wi = 0;
    uint32_t ho = 0;
    uint32_t wo = 0;

    uint32_t height = 0;
    uint32_t width = 0;

    uint32_t howo = 0;

    uint32_t mNum = 0;
    uint32_t nNum = 0;
    uint32_t kNum = 0;

    uint32_t mBlockNum = 0;
    uint32_t kBlockNum = 0;
    uint32_t nBlockNum = 0;

    uint32_t roundM = 0;
    uint32_t roundN = 0;
    uint32_t roundK = 0;

    uint32_t mTileBlock = 0;
    uint32_t nTileBlock = 0;
    uint32_t kTileBlock = 0;

    uint32_t mIterNum = 0;
    uint32_t nIterNum = 0;
    uint32_t kIterNum = 0;

    uint32_t mTileNums = 0;

    bool mHasTail = false;
    bool nHasTail = false;
    bool kHasTail = false;

    uint32_t kTailBlock = 0;
    uint32_t mTailBlock = 0;
    uint32_t nTailBlock = 0;

    uint32_t mTailNums = 0;
};
Table 2 Parameters in the Conv2DParams structure:

Parameter

Input/Output

Meaning

imgShape

vector<int>

Shape of feature_map, in the format [H, W].
  • H indicates the height. The value range is [1, 40].
  • W indicates the width. The value range is [1, 40].

kernelShape

vector<int>

Shape of weight, in the format [Kh, Kw].

  • Kh indicates the height. The value range is [1, 5].
  • Kw indicates the width. The value range is [1, 5].

stride

vector<int>

Convolution stride, in the format of [stride_h, stride_w].
  • stride_h: height stride, within the range [1, 4].
  • stride_w: width stride, within the range of [1, 4].

cin

int

Fractal layout parameter. Cin = C1 * C0. Cin indicates the number of input channels. The value range of C1 is [1, 4].

  • If feature_map is of type float, C0 = 8. The value range of the input channel is [8, 16, 24, 32].
  • If feature_map is of type half, C0 is 16. The value range of the input channel is [16, 32, 48, 64].
  • If feature_map is of type int8_t, C0 is 32. The value range of channel is [32, 64, 96, 128].

cout

int

Cout indicates the number of filters. The value range is [16, 32, 64, 128], which must be a multiple of 16.

padList

vector<int>

Padding factors, in the format of [pad_left, pad_right, pad_top, pad_bottom].
  • pad_left: number of columns to be padded to the left of feature_map. Must be in the range of [0, 4]. pad_right: number of columns to be padded to the right of the feature_map. Must be in the range of [0, 4].
  • pad_top: number of rows to be padded to the top of the feature_map. Must be in the range of [0, 4].
  • pad_bottom: number of rows to be padded to the bottom of the feature_map. Must be in the range of [0, 4].

dilation

vector<int>

Convolution dilation factors, in the format of [dilation_h, dilation_w]
  • dilation_h: height dilation factor. Must be in the range of [1, 4].
  • dilation_w: width dilation factor. Must be in the range of [1, 4].

The width and height of the dilated convolution kernel is calculated as follows: dilation_w * (Kw – 1) + 1; dilation_h * (Kh – 1) + 1

initY

uint32_t

Indicates whether dstLocal needs to be initialized.

  • 0: Bias is not used. L0C needs to be initialized. The dstLocal initial matrix stores the previous conv2d result and will be added up with the new conv2d result.
  • 1: Bias is not used. L0C does not need to be initialized. The dstLocal initial matrix will be overwritten by the compute result.

partialSum

uint32_t

When QuePosition where the dstLocal parameter is located is set to CO2, this parameter is used to control whether the computation result is moved out.
  • 0: move out computation result.
  • 1: The computation result is not moved out but used for subsequent computation.
Table 3 Parameters in the Conv2dTilling structure

Parameter

Input/Output

Meaning

blockSize

uint32_t

Number of elements stored in a dimension. The value is fixed at 16.

loopMode

LoopMode

Traversal mode. The structure is defined as follows:

1
2
3
4
5
6
enum class LoopMode {
    MODE_NM = 0,
    MODE_MN = 1,
    MODE_KM = 2,
    MODE_KN = 3
};

c0Size

uint32_t

Length of a block. The value can be 16 or 32.

dtypeSize

uint32_t

Length of the input data, in bytes. The value range is [1, 2].

strideH

uint32_t

Height of the convolution stride. The value range is [1, 4].

strideW

uint32_t

Width of the convolution stride. The value range is [1, 4].

dilationH

uint32_t

Height of the convolution dilation factor. The value range is [1, 4].

dilationW

uint32_t

Width of the convolution dilation factor. The value range is [1, 4].

hi

uint32_t

Height of the feature_map shape. The value range is [1, 40].

wi

uint32_t

Width of the feature_map shape. The value range is [1, 40].

ho

uint32_t

Height of the feature_map shape. The value range is [1, 40].

wo

uint32_t

Width of the feature_map shape. The value range is [1, 40].

height

uint32_t

Height of the weight shape. The value range is [1, 5].

width

uint32_t

Width of the weight shape. The value range is [1, 5].

howo

uint32_t

Size of the feature_map shape (ho * wo)

mNum

uint32_t

Equivalent data length of the M axis. The value range is [1, 4096].

nNum

uint32_t

Equivalent data length of the N axis. The value range is [1, 4096].

kNum

uint32_t

Equivalent data length of the K axis. The value range is [1, 4096].

roundM

uint32_t

Equivalent data length of the M axis. The value is rounded up to an integer multiple of blockSize. The value range is [1, 4096].

roundN

uint32_t

Equivalent data length of the N axis. The value is rounded up to an integer multiple of blockSize. The value range is [1, 4096].

roundK

uint32_t

Equivalent data length of the K axis. The value is rounded up to an integer multiple of c0Size. The value range is [1, 4096].

mBlockNum

uint32_t

Number of blocks on the M axis. mBlockNum = mNum/blockSize. The value range is [1, 4096].

nBlockNum

uint32_t

Number of blocks on the N axis. nBlockNum = nNum/blockSize. The value range is [1, 4096].

kBlockNum

uint32_t

Number of blocks on the K axis. kBlockNum = kNum/blockSize. The value range is [1, 4096].

mIterNum

uint32_t

Number of dimensions traversed on the M axis. The value range is [1, 4096].

nIterNum

uint32_t

Number of dimensions traversed on the N axis. The value range is [1, 4096].

kIterNum

uint32_t

Number of dimensions traversed on the K axis. The value range is [1, 4096].

mTileBlock

uint32_t

Number of split blocks on the M axis. The value range is [1, 4096].

nTileBlock

uint32_t

Number of split blocks on the N axis. The value range is [1, 4096].

kTileBlock

uint32_t

Number of split blocks on the K axis. The value range is [1, 4096].

kTailBlock

uint32_t

Number of tail blocks on the K axis. The value range is [1, 4096].

mTailBlock

uint32_t

Number of tail blocks on the M axis. The value range is [1, 4096].

nTailBlock

uint32_t

Number of tail blocks on the N axis. The value range is [1, 4096].

kHasTail

bool

Indicates whether a tail block exists on the K axis.

mHasTail

bool

Indicates whether a tail block exists on the M axis.

nHasTail

bool

Indicates whether a tail block exists on the N axis.

mTileNums

uint32_t

Length of split blocks on the M axis. The value range is [1, 4096].

mTailNums

uint32_t

Length of tail blocks on the M axis. The value range is [1, 4096].

Table 4 Data type of imgShape, kernelShape, and dstLocal

feature_map.dtype

weight.dtype

dst.dtype

int8_t

int8_t

int32_t

half

half

float

half

half

half

Availability

Atlas Training Series Product

Precautions

  • This instruction does not support the scenario where W is equal to Kw and H is greater than Kh. This will produce unexpected results.
  • For details about the alignment requirements of the operand address offset, see General Restrictions.