Conv3D Instructions

Ascend C provides a group of high-level Conv3D APIs for users to quickly implement three-dimensional convolution forward cube computation. Figure 1 is a 3D forward convolution diagram. The calculation formula is as follows:

  • X is the feature matrix input of Conv3D convolution.
  • W is the weight matrix of the Conv3D convolution.
  • B is the bias matrix of the Conv3D convolution.
  • Y is the result matrix output after completing the convolution and bias operations.
Figure 1 3D forward convolution diagram

Cin is the input channel size of the input matrix. Din is the depth size of the input matrix. Hin is the height size of the input matrix. Win is the width size of the input matrix. Cout is the output channel size of the weight and output matrix. Dout is the depth size of the output matrix. Hout is the height size of the output matrix. Wout is the width size of the output matrix. The M dimension mentioned in the following is the vertical axis of the input matrix after it is expanded in the img2col operation. The value of M is equal to Hout multiplied by Wout.

Channel, Depth, Height, and Width are abbreviated as C, D, H, and W, respectively.

In addition to the preceding basic operations, the parameters Padding, Stride, and Dilation can be set in Conv3D computation. Their meanings are as follows:

  • Padding indicates that 0 is padded to the three dimensions of the input matrix. See Figure 2.
  • Stride indicates the sliding distance of the convolution kernel in the three dimensions. See Figure 3.
  • Dilation indicates the spacing among data in the three dimensions of the convolution kernel. See Figure 4.
Figure 2 3D convolution forward Padding diagram
Figure 3 3D convolution forward Stride diagram
Figure 4 3D convolution forward Dilation diagram

The procedure for implementing Conv3D computation on the kernel side is as follows:

  1. Create a Conv3D object.
  2. Perform the initialization operation.
  3. Set the 3D convolution Input, Weight, Bias, and Output.
  4. Perform the 3D convolution operation.
  5. Complete the 3D convolution operation.

To use the high-level Conv3D API to implement forward convolution, perform the following steps:

  1. Create a Conv3D object.
    1
    2
    3
    4
    5
    6
    7
    8
    #include "lib/conv/conv3d/conv3d_api.h"
    
    using inputType = ConvApi::ConvType<AscendC::TPosition::GM, ConvFormat::NDC1HWC0, bfloat16_t>;
    using weightType = ConvApi::ConvType<AscendC::TPosition::GM, ConvFormat::FRACTAL_Z_3D, bfloat16_t>;
    using outputType = ConvApi::ConvType<AscendC::TPosition::GM, ConvFormat::NDC1HWC0, bfloat16_t>;
    using biasType = ConvApi::ConvType<AscendC::TPosition::GM, ConvFormat::ND, float>; // Optional parameters
    
    Conv3dApi::Conv3D<inputType, weightType, outputType, biasType> conv3dApi;
    

    When creating an object, you need to pass the types of the Input, Weight, and Output parameters. The Bias parameter is optional. If the convolution computation does not involve Bias input, this parameter is not passed. The type information is defined by ConvType, including the logical memory location, data format, and data type.

    1
    2
    3
    4
    5
    6
    template <TPosition POSITION, ConvFormat FORMAT, typename TYPE>
    struct ConvType {
        constexpr static TPosition pos = POSITION;    // Position of the Conv3d input or output in memory
        constexpr static ConvFormat format = FORMAT;  // Conv3dinput or output data format
        using T = TYPE;                               // Conv3d input or output data type
    };
    

    The following briefly describes the data structures used for object creation. Developers can selectively understand these content. The data structure used to create a Conv3D object is defined as follows:

    1
    2
    template <class INPUT_TYPE, class WEIGHT_TYPE, class OUTPUT_TYPE, class BIAS_TYPE = biasType, class CONV_CFG = Conv3dParam>
    using Conv3D = Conv3dIntfExt<Config<ConvApi::ConvDataType<INPUT_TYPE, WEIGHT_TYPE, OUTPUT_TYPE, BIAS_TYPE, CONV_CFG>>, Impl, Intf>
    

    The Conv3dIntfExt and Conv3dParam data structures are defined as follows:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    template <class Conv3dCfg, template <typename, class, bool> class Impl = Conv3dApiImpl,
        template <class, template <typename, class, bool> class> class Intf = Conv3dIntf>
    struct Conv3dIntfExt : public Intf<Conv3dCfg, Impl> {
        __aicore__ inline Conv3dIntfExt()
        {}
    };
    struct Conv3dParam : public ConvApi::ConvParam {
        __aicore__ inline Conv3dParam(){};
    };
    

    Conv3dIntf is the base class of Conv3dIntfExt, and Conv3dCfg is the Conv3dIntf template input parameter. The data structure is defined as follows:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    template <class Config, template <typename, class, bool> class Impl>
    struct Conv3dIntf {
        using InputT = typename Config::SrcAT;
        using WeightT = typename Config::SrcBT;
        using OutputT = typename Config::DstT;
        using BiasT = typename Config::BiasT;
        using L0cT = typename Config::L0cT;
        using ConvParam = typename Config::ConvParam;
        __aicore__ inline Conv3dIntf()
        {}
    }
    template <class ConvDataType>
    struct Conv3dCfg : public ConvApi::ConvConfig<ConvDataType> {
    public:
        __aicore__ inline Conv3dCfg()
        {}
        using ContextData = struct _ : public ConvApi::ConvConfig<ConvDataType>::ContextData {
            __aicore__ inline _()
            {}
        };
    };
    
    Table 1 ConvType parameters

    Parameter

    Description

    TPosition

    Logical memory location.

    • This parameter can be set to TPosition::GM for the input matrix.
    • This parameter can be set to TPosition::GM for the weight matrix.
    • This parameter can be set to TPosition::GM for the bias matrix.
    • This parameter can be set to TPosition::GM for the output matrix.

    ConvFormat

    TYPE

    Data type.
    • This parameter can be set to half or bfloat16_t for the input matrix.
    • This parameter can be set to half or bfloat16_t for the weight matrix.
    • This parameter can be set to half or float for the bias.
    • This parameter can be set to half or bfloat16_t for the output matrix.

    Note: The data types of the input and output matrices must match. For details about the supported data type combinations, see Table 2.

    Table 2 Combinations of Conv3D input and output data types

    Input Matrix

    Weight Matrix

    Bias

    Output Matrix

    Supported Platform

    half

    half

    half

    half

    • Atlas A3 training products / Atlas A3 inference products
    • Atlas A2 training products / Atlas A2 inference products

    bfloat16_t

    bfloat16_t

    float

    bfloat16_t

    • Atlas A3 training products / Atlas A3 inference products
    • Atlas A2 training products / Atlas A2 inference products
  2. Perform the initialization operation.
    1
    2
    3
    Conv3dApi::Conv3D<inputType, weightType, outputType, biasType> conv3dApi;
    TPipe pipe;                                                        // Initialize TPipe.
    conv3dApi.Init(&tiling);                                           // Initialize conv3dApi.
    
  3. Set the 3D convolution Input, Weight, Bias, and Output.
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    conv3dApi.SetWeight(weightGm);               // Set the address of the input weight of the current core in the GM.
    if (biasFlag) {
        conv3dApi.SetBias(biasGm);               // Set the address of the input bias of the current core on GM.
    }
    // Set the offset of each dimension of input in the current core.
    conv3dApi.SetInputStartPosition(diStartPos, mStartPos);
    // Set the sizes of cout, dout, and m for the current core.
    conv3dApi.SetSingleOutputShape(singleCoreCout, singleCoreDout, singleCoreM);
    
    // Currently, Conv3D supports only single-batch convolution computation. In multi-batch scenarios, the for loop is used to implement the process, computing the address offset of the current batch between loops.
    for (uint64_t batchIter = 0; batchIter < singleCoreBatch; ++batchIter) {
        conv3dApi.SetInput(inputGm[batchIter * inputOneBatchSize]);    // Set the address of the input for the current core on the GM.
    }
    
  4. Perform the 3D convolution operation.
    Call IterateAll to compute all data on a single core.
    1
    2
    3
    4
    5
    for (uint64_t batchIter = 0; batchIter < singleCoreBatch; ++batchIter) {
        ...
        conv3dApi.IterateAll(outputGm[batchIter * outputOneBatchSize]);    // Call IterateAll to complete the Conv3D computation.
        ...
    }
    
  5. Complete the 3D convolution operation.
    1
    2
    3
    4
    for (uint64_t batchIter = 0; batchIter < singleCoreBatch; ++batchIter) {
        ...
        conv3dApi.End();    //Clear the EventID and release the temporarily allocated internal memory.
    }
    

Header File to Be Included

1
#include "lib/conv/conv3d/conv3d_api.h"