WelfordUpdate Tiling

Function Usage

Ascend C provides the WelfordUpdate Tiling API to obtain the tiling parameter required for WelfordUpdate kernel computation.

To obtain tiling parameters, perform the following steps:

Specifically, use the GetWelfordUpdateMaxMinTmpSize to obtain the maximum and minimum temporary space sizes required for WelfordUpdate computation.

To perform WelfordUpdate computation on the kernel, you need to reserve or allocate the temporary space. GetWelfordUpdateMaxMinTmpSize is used to obtain the maximum and minimum sizes of the temporary space to be reserved or allocated on the host. You can select a proper size within this range as the tiling parameter and pass it to the kernel.
  • To ensure correct functions, the temporary space to be reserved or allocated cannot be less than the minimum temporary space.
  • Within the range between the minimum and maximum, as the temporary space increases, the API computing performance on the kernel can be optimized to some extent. To achieve better performance, reserve or allocate the space based on the actual buffer usage.

Prototype

1
void GetWelfordUpdateMaxMinTmpSize(const ge::Shape& srcShape, const uint32_t typeSizeT, const uint32_t typeSizeU, const bool isReuseSource, const bool isInplace, uint32_t& maxValue, uint32_t& minValue)

Parameters

Table 1 GetWelfordUpdateMaxMinTmpSize API parameters

Parameter

Input/Output

Description

srcShape

Input

Input shape information {rnLength, abLength}. The meanings of rnLength and abLength are the same as those of the WelfordUpdate API.

typeSizeT

Input

Data type size of the operator input inputX. The unit is byte. For example, if the input data type of the operator is half, set this parameter to 2.

typeSizeU

Input

Data type size of the mean and variance (outputMean, outputVariance, inputMean, and inputVariance). The unit is byte. For example, if the input data type of the operator is float, set this parameter to 4.

isReuseSource

Input

Whether the source operand can be modified. The value is the same as that of the WelfordUpdate API.

isInplace

Input

Whether the destination operand reuses the source operand. The value is the same as that of the WelfordUpdate API.

maxValue

Output

Maximum size of the temporary space required by WelfordUpdate computation. Any space exceeding this value will not be utilized by the API. Within the range between the minimum and maximum, as the temporary space increases, the API computing performance on the kernel can be optimized to some extent. To achieve better performance, reserve or allocate the space based on the actual buffer usage. If the maximum space size is 0, no temporary space is required.

NOTE:

max is for reference only and may be larger than the available space of the Unified Buffer. In this case, select a proper temporary space size based on the remaining space of the Unified Buffer.

minValue

Output

Minimum size of the temporary space required by WelfordUpdate computation. To ensure correct functions, the size of the temporary space to be reserved or allocated during API computation cannot be less than the value of this parameter. If the minimum space size is 0, no temporary space is required.

Returns

  • The return value of GetWelfordUpdateMaxMinTmpSize is either true or false. true indicates that the maximum and minimum temporary space sizes required for WelfordUpdate internal computation are successfully obtained. false indicates that the sizes fail to be obtained. In this case, check whether the input shape meets the specified requirements.

Example

  1. Add the WelfordUpdate Tiling structure parameter to the TilingData structure to function as a field.
    1
    2
    3
    4
    5
    6
    7
    8
    BEGIN_TILING_DATA_DEF(WelfordUpdateCustomTilingData) // Register a tiling class and use the tiling name as the input parameter.
      TILING_DATA_FIELD_DEF(uint32_t, inplace); // Add the tiling field to specify whether to reuse input for output.
      TILING_DATA_FIELD_DEF(uint32_t, nLength);
      TILING_DATA_FIELD_DEF(uint32_t, rLength);
      TILING_DATA_FIELD_DEF(uint32_t, abComputeLength);
      TILING_DATA_FIELD_DEF(uint32_t, nRec);
    END_TILING_DATA_DEF;
      REGISTER_TILING_DATA_CLASS(WelfordUpdateCustom, WelfordUpdateCustomTilingData)// Add the WelfordUpdateCustomTilingData structure parameter to the TilingData structure.
    
  2. The tiling implementation function first calls the GetWelfordUpdateMaxMinTmpSize API to obtain the maximum and minimum temporary space sizes required by the WelfordUpdate API to complete computation, sets an appropriate space size based on this range and the actual buffer usage, and then obtains the tiling parameter required by the WelfordUpdate kernel API based on the input shape and remaining size of computing space.
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    namespace optiling {
    static ge::graphStatus TilingFunc(gert::TilingContext *context)
    {
        WelfordUpdateCustomTilingData tiling;
        const gert::RuntimeAttrs *attrs = context->GetAttrs();
        const uint32_t inplace = *(attrs->GetAttrPointer<uint32_t>(0));
        const uint32_t abComputeLength = *(attrs->GetAttrPointer<uint32_t>(1));
        const uint32_t sharedtmpbuffer = *(attrs->GetAttrPointer<uint32_t>(2));
    
        const gert::StorageShape *x1_shape = context->GetInputShape(1);
        const gert::Shape shape = x1_shape->GetStorageShape();
        auto nLength = shape.GetDim(0);
        auto rLength = shape.GetDim(1);
    
        std::vector<int64_t> srcDims = {nLength, rLength};
        ge::Shape srcShape(srcDims);
    
        uint32_t maxTmpsize = 0;
        uint32_t minTmpsize = 0;
        // This example is only for reference. Use GetWelfordUpdateMaxMinTmpSize to obtain the minimum value and pass it to ensure correct functionality. Pass a proper space size as required.
        AscendC::GetWelfordUpdateMaxMinTmpSize(srcShape, 4, 4, false, false, maxTmpsize, minTmpsize);
        
    
        ... // Other logic
        context->SetTilingKey(1);
        tiling.SaveToBuffer(context->GetRawTilingData()->GetData(), context->GetRawTilingData()->GetCapacity());
        context->GetRawTilingData()->SetDataSize(tiling.GetDataSize());
        size_t *currentWorkspace = context->GetWorkspaceSizes(1);
        currentWorkspace[0] = 0;
        return ge::GRAPH_SUCCESS;
    }
    } // namespace optiling
    
  3. The kernel calls GET_TILING_DATA in the kernel function to obtain TilingData, and then passes the WelfordUpdate Tiling information in TilingData to the WelfordUpdate API for computation. For details about the complete example on the kernel, see WelfordUpdate.
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    extern "C" __global__ __aicore__ void
    welford_update_custom(
        GM_ADDR inputX_gm, GM_ADDR mean_gm, GM_ADDR var_gm, GM_ADDR outputMean_gm, GM_ADDR outputVariance_gm, GM_ADDR workspace, GM_ADDR tiling)
    {
        GET_TILING_DATA(tilingData, tiling);
        if (TILING_KEY_IS(1))
        {
            if (tilingData.inplace)
            {
                KernelWelfordUpdate<DTYPE_INPUTX, DTYPE_U, true> op;
                
                op.Init(inputX_gm, mean_gm, var_gm, outputMean_gm, outputVariance_gm, tilingData.nLength, tilingData.rLength, tilingData.abComputeLength);
                op.Process();
            }
            else
            {
                KernelWelfordUpdate<DTYPE_INPUTX, DTYPE_U, false> op;
                
                op.Init(inputX_gm, mean_gm, var_gm, outputMean_gm, outputVariance_gm, tilingData.nLength, tilingData.rLength, tilingData.abComputeLength);
                op.Process();
            }
        }
    }