Normalize Tiling

Function Usage

Ascend C provides the Normalize Tiling API for users to obtain the tiling parameter required for Normalize kernel computation.

Specifically, the GetNormalizeMaxMinTmpSize is used to obtain the maximum and minimum temporary space sizes required for Normalize computation.

To perform Normalize computation in the kernel, developers need to reserve or allocate the temporary space. GetNormalizeMaxMinTmpSize is used to obtain the maximum and minimum sizes of the temporary space to be reserved or allocated on the host. Developers can select a proper size within this range as the tiling parameter and pass it to the kernel.
  • To ensure correct functions, the temporary space to be reserved or allocated cannot be less than the minimum temporary space.
  • Within the range between the minimum and maximum, as the temporary space increases, the API computing performance in the kernel can be optimized to some extent. To achieve better performance, reserve or allocate the space based on the actual buffer usage.

Prototype

1
void GetNormalizeMaxMinTmpSize(const ge::Shape& srcShape, const uint32_t typeSizeU, const uint32_t typeSizeT, const bool isReuseSource, const bool isComputeRstd, const bool isOnlyOutput, uint32_t& maxValue, uint32_t& minValue)

Parameters

Table 1 GetNormalizeMaxMinTmpSize API parameters

Parameter

Input/Output

Description

srcShape

Input

Shape information {A, R} of the input data inputX for Normalize.

typeSizeU

Input

Data type size of the input data gamma and beta. The unit is byte. For example, if the input data type of the operator is float, set this parameter to 4.

typeSizeT

Input

Data type size of the input data inputX. The unit is byte. For example, if the input data type of the operator is float, set this parameter to 4.

isReuseSource

Input

Whether to reuse the buffer space of the source operand, which must be the same as that of the Normalize API.

isComputeRstd

Input

Whether to compute rstd. The value of this parameter can only be true.

isOnlyOutput

Input

Whether to output only y but not the reciprocal rstd of the standard deviation. Currently, this parameter can only be set to false, indicating that all y and rstd results are output.

maxValue

Output

Tiling information (maximum temporary space size) required by the Normalize API.

Maximum size of the temporary space required by Normalize computation. Any space exceeding this value will not be utilized by the API. Within the range between the minimum and maximum, as the temporary space increases, the API computing performance in the kernel can be optimized to some extent. To achieve better performance, reserve or allocate the space based on the actual buffer usage.

NOTE:

maxValue is for reference only and may be larger than the remaining space of the Unified Buffer. In this case, select a proper temporary space size based on the remaining space of the Unified Buffer.

minValue

Output

Tiling information (minimum temporary space size) required by the Normalize API.

Minimum size of the temporary space required by Normalize computation. To ensure correct functions, the size of the temporary space to be reserved or allocated during API computation cannot be less than the value of this parameter.

Returns

None

Example

  1. Add the parameters required by the Normalize API to the TilingData structure as a field.
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    BEGIN_TILING_DATA_DEF(NormalizeCustomTilingData)
      TILING_DATA_FIELD_DEF(float, epsilon);
      TILING_DATA_FIELD_DEF(uint32_t, isNoBeta);
      TILING_DATA_FIELD_DEF(uint32_t, isNoGamma);
      TILING_DATA_FIELD_DEF(uint32_t, isOnlyOutput);
      TILING_DATA_FIELD_DEF(uint32_t, aLength);
      TILING_DATA_FIELD_DEF(uint32_t, rLength);
      TILING_DATA_FIELD_DEF(uint32_t, rLengthWithPadding);
      ...                                           // Add other tiling fields.
    END_TILING_DATA_DEF;
    
  2. The tiling implementation function first calls the GetNormalizeMaxMinTmpSize API to obtain the maximum and minimum temporary space sizes required by the Normalize API to complete computation, sets an appropriate space size based on this range and the actual buffer usage, and then obtains the tiling parameter required by the Normalize kernel API based on the input shape and remaining size of computing space.
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    namespace optiling {
    static ge::graphStatus TilingFunc(gert::TilingContext *context)
    {
        NormalizeCustomTilingData tiling;
        const gert::RuntimeAttrs *attrs = context->GetAttrs();
        const float epsilon = *(attrs->GetAttrPointer<float>(0));
        const uint32_t isNoBeta = *(attrs->GetAttrPointer<uint32_t>(1));
        const uint32_t isNoGamma = *(attrs->GetAttrPointer<uint32_t>(2));
        const uint32_t isOnlyOutput = *(attrs->GetAttrPointer<uint32_t>(3));
        const gert::StorageShape* x1_shape = context->GetInputShape(0);
         ... // Other logic
        const gert::Shape shape = x1_shape->GetStorageShape();
        uint32_t aLength = shape.GetDim(0);
        uint32_t rLength = shape.GetDim(1);
        uint32_t rLengthWithPadding = (rLength + alignNum - 1) / alignNum * alignNum;
        std::vector<int64_t> srcDims = {aLength, rLength};
        ge::Shape srcShape(srcDims);
    
        uint32_t maxTmpsize = 0;
        uint32_t minTmpsize = 0;
        
        AscendC::GetNormalizeMaxMinTmpSize(srcShape, typeSizeU, typeSizeT, false, true, isOnlyOutput, maxTmpsize, minTmpsize);
    
         ... // Other logic
        context->SetTilingKey(1);
        tiling.SaveToBuffer(context->GetRawTilingData()->GetData(), context->GetRawTilingData()->GetCapacity());
        context->GetRawTilingData()->SetDataSize(tiling.GetDataSize());
        size_t *currentWorkspace = context->GetWorkspaceSizes(1);
        currentWorkspace[0] = 0;
        return ge::GRAPH_SUCCESS;
    }
    } // namespace optiling
    
  3. The kernel calls GET_TILING_DATA in the kernel function to obtain TilingData, and then passes the Normalize Tiling information in TilingData to the Normalize API for computation. For details about the complete example on the kernel, see Normalize.
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    extern "C" __global__ __aicore__ void normalize_custom(GM_ADDR x, GM_ADDR mean, GM_ADDR variance, GM_ADDR gamma, GM_ADDR beta, GM_ADDR rstd, GM_ADDR y, GM_ADDR workspace, GM_ADDR tiling) {
        GET_TILING_DATA(tilingData, tiling);
        float epsilon = tilingData.epsilon;
        NormalizePara para(tilingData.aLength, tilingData.rLength, tilingData.rLengthWithPadding);
        if (TILING_KEY_IS(1)) {
          if (!tilingData.isNoBeta && !tilingData.isNoGamma) {
              KernelNormalize<NLCFG_NORM> op;
              op.Init(x, mean, variance, gamma, beta, rstd, y, epsilon, para);
              op.Process();
          } else if (!tilingData.isNoBeta && tilingData.isNoGamma) {
              KernelNormalize<NLCFG_NOGAMMA> op;
              op.Init(x, mean, variance, gamma, beta, rstd, y, epsilon, para);
              op.Process();
          } else if (tilingData.isNoBeta && !tilingData.isNoGamma) {
              KernelNormalize<NLCFG_NOBETA> op;
              op.Init(x, mean, variance, gamma, beta, rstd, y, epsilon, para);
              op.Process();
          } else if (tilingData.isNoBeta && tilingData.isNoGamma) {
              KernelNormalize<NLCFG_NOOPT> op;
              op.Init(x, mean, variance, gamma, beta, rstd, y, epsilon, para);
              op.Process();
          }
        }
      }