LayerNormGrad Tiling

Function Usage

LayerNormGrad Tiling provides the following functions:

  • Obtain the maximum and minimum temporary space sizes reserved or allocated on the host.

    To perform LayerNormGrad computation in the kernel, developers need to reserve or allocate the temporary space. The GetLayerNormGradMaxMinTmpSize API is used to obtain the maximum and minimum sizes of the temporary space to be reserved or allocated on the host. Developers can select a proper size within this range as the tiling parameter and pass it to the kernel.

    • To ensure correct functions, the temporary space to be reserved or allocated cannot be less than the minimum temporary space.
    • Within the range between the minimum and maximum, as the temporary space increases, the API computing performance in the kernel can be optimized to some extent. To achieve better performance, reserve or allocate the space based on the actual buffer usage.
  • Use the GetLayerNormGradNDTilingInfo API to obtain the tiling parameter required by the LayerNormGrad kernel API, and pass the input shapes, remaining space for LayerNormGrad computation, and computation data types.

    The definition of the LayerNormGradTiling structure is as follows. You do not need to pay attention to the specific information of this tiling structure, but only need to pass it to the kernel and directly use it through LayerNormGrad high-level APIs.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    struct LayerNormGradTiling {
        uint32_t stackBufferSize = 0;
        uint32_t bLength = 0;
        uint32_t sLength = 0;
        uint32_t hLength = 0;
        uint32_t originalHLength = 0;
        uint32_t oneCalSize = 0;
        uint32_t nohCalSize = 0;
        uint32_t loopNum = 0;
        uint32_t tailSize = 0;
        uint32_t nohTailSize = 0;
        uint32_t tmpTensorBSHPos = 0;
        uint32_t tmpTensorBSHSize = 0;
        uint32_t pdVarTensorPos = 0;
        uint32_t pdVarTensorSize = 0;
        uint32_t pdMeanTensorPos = 0;
        uint32_t pdMeanTensorSize = 0;
        uint32_t x1TensorPos = 0;
        uint32_t x1TensorSize = 0;
        uint32_t x2TensorPos = 0;
        uint32_t x2TensorSize = 0;
        uint32_t x3TensorPos = 0;
        uint32_t x3TensorSize = 0;
        uint32_t tmpTensorPos = 0;
        uint32_t tmpTensorSize = 0;
        uint32_t tmpTensor1Pos = 0;
        uint32_t tmpTensor1Size = 0;
        uint32_t tmpTensor2Pos = 0;
        uint32_t tmpTensor2Size = 0;
        uint32_t lastDimValueBack = 0;
        uint32_t lastDimValueBackMulTwo = 0;
    };
    

Prototype

1
void GetLayerNormGradMaxMinTmpSize(const ge::Shape &srcShape, const uint32_t typeSize, const bool isReuseSource, uint32_t &maxValue, uint32_t &minValue)
1
void GetLayerNormGradNDTilingInfo(const ge::Shape srcShape, const uint32_t stackBufferSize, const uint32_t typeSize, const bool isReuseSource, optiling::LayerNormGradTiling &tiling)

Parameters

Table 1 GetLayerNormGradMaxMinTmpSize API parameters

Parameter

Input/Output

Meaning

srcShape

Input

Shape {B, S, storageHLength, originHLength} of input data inputDy, including the shape information of the current inputDy and the original shape information before address alignment (if the H axis padding operation is performed).

In scenarios where the API is supported, the values of storageHLength and originHLength must be the same.

typeSize

Input

Data type size of operator inputs. The unit is byte. For example, if the input data type of the operator is half, set this parameter to 2.

isReuseSource

Input

Whether to reuse the buffer space of the source operand, which must be the same as that of the LayerNorm API.

maxValue

Output

Maximum size of the temporary space required by LayerNormGrad computation. Any space exceeding this value will not be utilized by the API. Within the range between the minimum and maximum, as the temporary space increases, the API computing performance in the kernel can be optimized to some extent. To achieve better performance, reserve or allocate the space based on the actual buffer usage. If the maximum space size is 0, no temporary space is required.

NOTE:

maxValue is for reference only and may be larger than the remaining space of the Unified Buffer. In this case, select a proper temporary space size based on the remaining space of the Unified Buffer.

minValue

Output

Minimum size of the temporary space required by LayerNormGrad computation. To ensure correct functions, the size of the temporary space to be reserved or allocated during API computation cannot be less than the value of this parameter. If the minimum space size is 0, no temporary space is required.

Table 2 GetLayerNormGradNDTilingInfo API parameters

Parameter

Input/Output

Meaning

srcShape

Input

Shape of input data inputDy, including the current input shape information and the original shape information before address alignment.

stackBufferSize

Input

Size of the space that can be used by the API. The unit is the number of elements.

typeSize

Input

Data type size of operator inputs. The unit is byte. For example, if the input data type of the operator is half, set this parameter to 2.

isReuseSource

Input

Whether the buffer space of inputX and inputDy can be reused.

tilling

Output

Tilling information of input data.

Returns

None

Example

The following example describes the process of obtaining the tiling parameter on the host and the method of using the parameter on the kernel when LayerNormGrad high-level APIs are used. In this example, the shape size of the input tensor is [2, 16, 64], and the input data type is half.

  1. Add the LayerNormGradTiling structure parameter to the TilingData structure to function as a field.
    1
    2
    3
    4
    5
    6
    BEGIN_TILING_DATA_DEF(TilingData)               // Register a tiling class and uses the tiling name as the input parameter.
      TILING_DATA_FIELD_DEF(uint32_t, totalLength); // Add the tiling field to compute the total data volume.
      TILING_DATA_FIELD_DEF(uint32_t, tileNum);     // Add the tiling field that specifies the total number of data blocks to be computed on each core.
      ...                                           // Add other tiling fields.
      TILING_DATA_FIELD_DEF_STRUCT(LayerNormGradTiling, layernormGradTilingData); // Add the LayerNormGradTiling structure parameter to the TilingData structure.
    END_TILING_DATA_DEF;
    
  2. The tiling implementation function first calls the GetLayerNormGradMaxMinTmpSize API to obtain the maximum and minimum temporary space sizes required by the LayerNormGrad API to complete computation, sets an appropriate space size based on this range and the actual buffer usage, and then calls the GetLayerNormGradNDTillingInfo API to obtain the tiling parameter required by the LayerNormGradBeta kernel API based on the input shape and remaining size of computing space.
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    namespace optiling {
    const uint32_t BLOCK_DIM = 8;
    const uint32_t TILE_NUM = 8;
    static ge::graphStatus TilingFunc(gert::TilingContext* context)
    {
        TilingData tiling;
        uint32_t totalLength = context->GetInputTensor(0)->GetShapeSize();
        context->SetBlockDim(BLOCK_DIM);
        tiling.set_totalLength(totalLength);
        tiling.set_tileNum(TILE_NUM);
        // Set other tiling parameters.
        ...
        // {B, S, storageHLength, originHLength}
        std::vector<int64_t> shapeVec = {2, 16, 64, 64};
        ge::Shape srcShape(shapeVec);
        // This example is for reference only. Use GetLayerNormGradMaxMinTmpSize to obtain the minimum value and pass it to ensure correct functionality. Developers can pass a proper space size as required.
        uint32_t max;
        uint32_t min;
        AscendC::GetLayerNormGradMaxMinTmpSize(srcShape, sizeof(half), false, max, min);
        // Obtain the LayerNormGrad tiling parameter.
        AscendC::GetLayerNormGradNDTillingInfo(srcShape, min, sizeof(half), false, tiling.layernormGradTilingData); 
         ... // Other logic
        tiling.SaveToBuffer(context->GetRawTilingData()->GetData(), context->GetRawTilingData()->GetCapacity());
        context->GetRawTilingData()->SetDataSize(tiling.GetDataSize());
        context->SetTilingKey(1);
        return ge::GRAPH_SUCCESS;
    }
    } // namespace optiling
    
  3. The kernel calls GET_TILING_DATA in the kernel function to obtain TilingData, and then passes the LayerNormGradTiling information in TilingData to the LayerNormGrad API for computation. For details about the complete example on the kernel, see LayerNorm.
    1
    2
    3
    4
    5
    6
    7
    8
    9
    extern "C" __global__ __aicore__ void func_custom(GM_ADDR x, GM_ADDR y, GM_ADDR z, GM_ADDR workspace, GM_ADDR tiling)
    {
        GET_TILING_DATA(tilingData, tiling);
        KernelFunc op;
        op.Init(x, y, z, tilingData.totalLength, tilingData.tileNum,tilingData.layernormGradTilingData);
        if (TILING_KEY_IS(1)) {
            op.Process();
        }
    }