DeepNorm Tiling

Function Usage

Ascend C provides the DeepNorm Tiling API for users to obtain the tiling parameter required for DeepNorm kernel computation.

To obtain the tiling parameter, perform the following two steps:

  1. Use the GetDeepNormMaxMinTmpSize API to obtain the maximum and minimum temporary space sizes required for DeepNorm computation.
    To perform DeepNorm computation in the kernel, developers need to reserve or allocate the temporary space. This API is used to obtain the maximum and minimum sizes of the temporary space to be reserved or allocated on the host. Developers can select a proper size within this range as the tiling parameter and pass it to the kernel.
    • To ensure correct functions, the temporary space to be reserved or allocated cannot be less than the minimum temporary space.
    • Within the range between the minimum and maximum, as the temporary space increases, the API computing performance in the kernel can be optimized to some extent. To achieve better performance, reserve or allocate the space based on the actual buffer usage.
  2. Use the GetDeepNormTilingInfo API to obtain the tiling parameter required by the DeepNorm kernel API.
    The definition of the DeepNormTiling structure is as follows. You do not need to pay attention to the specific information of this tiling structure, but only need to pass it to the kernel and directly use it through DeepNorm high-level APIs.
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    struct DeepNormTiling {
        uint32_t bLength = 0;
        uint32_t sLength = 0;
        uint32_t hLength = 0;
        uint32_t originalHLength = 0;
        uint32_t inputXSize = 0;
        uint32_t meanVarSize = 0;
        uint32_t numberOfTmpBuf = 0;
        uint32_t meanTmpTensorPos = 0;
        uint32_t meanTmpTensorSize = 0;
        uint32_t varianceTmpTensorPos = 0;
        uint32_t varianceTmpTensorSize = 0;
        uint32_t tmpBufSize = 0;
        uint32_t oneTmpSize = 0;
        uint32_t firstTmpStartPos = 0;
        uint32_t secondTmpStartPos = 0;
        uint32_t thirdTmpStartPos = 0;
        uint32_t loopRound = 0;
        uint32_t inputRoundSize = 0;
        uint32_t inputTailSize = 0;
        uint32_t inputTailPos = 0;
        uint32_t meanVarRoundSize = 0;
        uint32_t meanVarTailSize = 0;
        uint32_t meanVarTailPos = 0;
        uint32_t bshCurLength = 0;
        uint32_t shCurLength = 0;
        float lastDimValueBack = 0;
    };
    

Prototype

1
bool GetDeepNormMaxMinTmpSize(const ge::Shape& srcShape, const uint32_t typeSize, const bool isReuseSource, const bool isBasicBlock, uint32_t& maxValue, uint32_t& minValue)
1
bool GetDeepNormTilingInfo(const ge::Shape& srcShape, const ge::Shape& originSrcShape, const uint32_t stackBufferSize, const uint32_t typeSize, const bool isReuseSource, const bool isBasicBlock, optiling::DeepNormTiling& tiling)

Parameters

Table 1 GetDeepNormMaxMinTmpSize API parameters

Parameter

Input/Output

Description

srcShape

Input

Input shape.

typeSize

Input

Data type size of operator inputs. The unit is byte. For example, if the input data type of the operator is half, set this parameter to 2.

isReuseSrc

Input

Whether to reuse the space of the source operand input, which must be the same as that of the DeepNorm API.

isBasicBlock

Input

Whether srcShape complies with the basic block definition: The length of the last axis (H axis) is a multiple of 64 (less than 2040), and the length of B*S is a multiple of 8.

maxValue

Output

Maximum size of the temporary space required by DeepNorm computation. Any space exceeding this value will not be utilized by the API. Within the range between the minimum and maximum, as the temporary space increases, the API computing performance in the kernel can be optimized to some extent. To achieve better performance, reserve or allocate the space based on the actual buffer usage. If the maximum space size is 0, no temporary space is required.

NOTE:

maxValue is for reference only and may be larger than the remaining space of the Unified Buffer. In this case, select a proper temporary space size based on the remaining space of the Unified Buffer.

minValue

Output

Minimum size of the temporary space required by DeepNorm computation. To ensure correct functions, the size of the temporary space to be reserved or allocated during API computation cannot be less than the value of this parameter. If the minimum space size is 0, no temporary space is required.

Table 2 GetDeepNormTilingInfo API parameters

Parameter

Input/Output

Description

srcShape

Input

Input shape [B, S, H].

originSrcShape

Input

Input shape [B, S, originH] before 32-byte alignment. The length of originH must be within the range of (0, H]. If isBasicBlock is set to true, originH must be the same as H.

stackBufferSize

Input

Buffer size of the temporary space. The unit is byte. Use the GetDeepNormMaxMinTmpSize API to obtain the maximum and minimum temporary space sizes, so that developers can select a proper size within this range and pass it as stackBufferByteSize.

typeSize

Input

Data type size of operator inputs. The unit is byte. For example, if the input data type of the operator is half, set this parameter to 2.

isReuseSrc

Input

Whether to reuse the space of the source operand input, which must be the same as that of the DeepNorm API.

isBasicBlock

Input

Whether srcShape complies with the basic block definition: The length of the last axis (H axis) is a multiple of 64 (less than 2040), and the length of B*S is a multiple of 8.

tiling

Output

Tiling information required for DeepNorm computation.

Returns

  • The return value of GetDeepNormMaxMinTmpSize is either true or false. true indicates that the maximum and minimum temporary space sizes required for DeepNorm internal computation are successfully obtained. false indicates that the sizes fail to be obtained.
  • The return value of GetDeepNormTilingInfo is either true or false. true indicates that the tiling parameter values of DeepNorm are successfully obtained. false indicates that the values fail to be obtained.

Example

  1. Add the DeepNormTiling structure parameter to the TilingData structure to function as a field.
    1
    2
    3
    4
    5
    6
    BEGIN_TILING_DATA_DEF(TilingData)               // Register a tiling class and uses the tiling name as the input parameter.
      TILING_DATA_FIELD_DEF(uint32_t, totalLength); // Add the tiling field to compute the total data volume.
      TILING_DATA_FIELD_DEF(uint32_t, tileNum);     // Add the tiling field that specifies the total number of data blocks to be computed on each core.
      ...                                           // Add other tiling fields.
      TILING_DATA_FIELD_DEF_STRUCT(DeepNormTiling, deepnormTilingData); // Add the DeepNormTiling structure parameter to the TilingData structure.
    END_TILING_DATA_DEF;
    
  2. The tiling implementation function first calls the GetDeepNormMaxMinTmpSize API to obtain the maximum and minimum temporary space sizes required by the DeepNorm API to complete computation, sets an appropriate space size based on this range and the actual buffer usage, and then obtains the tiling parameter required by the DeepNorm kernel API based on the input shape and remaining size of computing space.
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    namespace optiling {
    const uint32_t BLOCK_DIM = 8;
    const uint32_t TILE_NUM = 8;
    static ge::graphStatus TilingFunc(gert::TilingContext* context)
    {
        TilingData tiling;
        uint32_t totalLength = context->GetInputTensor(0)->GetShapeSize();
        context->SetBlockDim(BLOCK_DIM);
        tiling.set_totalLength(totalLength);
        tiling.set_tileNum(TILE_NUM);
        // Set other tiling parameters.
        ... 
        std::vector<int64_t> shapeVec = {2, 16, 64};
        std::vector<int64_t> oriShapeVec = {2, 16, 64};
        ge::Shape srcShape(shapeVec);
        ge::Shape originSrcShape(oriShapeVec);
    
        // This example is for reference only. Use GetDeepNormMaxMinTmpSize to obtain the minimum value and pass it to ensure correct functionality. Developers can pass a proper space size as required.
        uint32_t minValue = 0;
        uint32_t maxValue = 0;
        AscendC::GetDeepNormMaxMinTmpSize(srcShape, sizeof(half), isReuseSrc, isBasicBlock, maxValue, minValue);
        // Obtain the DeepNorm tiling parameter.
        AscendC::GetDeepNormTilingInfo(srcShape, originSrcShape, minValue, sizeof(half), isReuseSrc, isBasicBlock, tiling.deepnormTilingData); 
        
         ... // Other logic
        tiling.SaveToBuffer(context->GetRawTilingData()->GetData(), context->GetRawTilingData()->GetCapacity());
        context->GetRawTilingData()->SetDataSize(tiling.GetDataSize());
        context->SetTilingKey(1);
        return ge::GRAPH_SUCCESS;
    }
    } // namespace optiling
    
  3. The kernel calls GET_TILING_DATA in the kernel function to obtain TilingData, and then passes the DeepNorm Tiling information in TilingData to the DeepNorm API for computation. For details about the complete example on the kernel, see DeepNorm.
    1
    2
    3
    4
    5
    6
    7
    8
    9
    extern "C" __global__ __aicore__ void deepnorm_custom(GM_ADDR inputX, GM_ADDR inputGx, GM_ADDR beta, GM_ADDR gamma, GM_ADDR output, GM_ADDR outputMean, GM_ADDR outputVariance, GM_ADDR tiling)
    {
        GET_TILING_DATA(tilingData, tiling);
        KernelDeepNorm op;
        op.Init(inputX, inputGx, beta, gamma, output, outputMean, outputVariance,  tilingData.totalLength, tilingData.tileNum, tilingData.deepnormTilingData);
        if (TILING_KEY_IS(1)) {
            op.Process();
        }
    }