WelfordFinalize Tiling
Function
Ascend C provides the WelfordFinalize tiling APIs for users to obtain the tiling parameters required for WelfordFinalize kernel computation.
To obtain the tiling parameters, perform the following steps:
Specifically, use GetWelfordFinalizeMaxMinTmpSize to obtain the maximum and minimum temporary space sizes required for WelfordFinalize computation.
- To ensure correct functions, the temporary space to be reserved or allocated cannot be less than the minimum temporary space.
- Within the range between the minimum and maximum, as the temporary space increases, the API computing performance in the kernel can be optimized to some extent. For better performance, reserve or allocate the temporary space based on the actual memory usage.
Prototype
1 | void GetWelfordFinalizeMaxMinTmpSize(const ge::Shape& srcShape, const uint32_t typeSize, const bool isReuseSource, uint32_t& maxValue, uint32_t& minValue) |
Parameters
Parameter |
Input/Output |
Description |
|---|---|---|
srcShape |
Input |
Shape information {abLength} of the input inputMean or inputVariance. |
typeSize |
Input |
Data type size of the input inputMean/inputVariance. The unit is byte. For example, if the input data type is float, set this parameter to 4. |
isReuseSource |
Input |
Whether the source operand can be modified. The value of this parameter is the same as that of the WelfordFinalize API. |
maxValue |
Output |
Maximum size of the temporary space required by WelfordFinalize computation. Any space exceeding this value will not be utilized by the API. Within the range between the minimum and maximum, as the temporary space increases, the API computing performance in the kernel can be optimized to some extent. For better performance, reserve or allocate the temporary space based on the actual memory usage. If the maximum space size is 0, no temporary space is required. NOTE:
maxValue is for reference only and may be larger than the remaining space of the Unified Buffer. In this case, select a proper temporary space size based on the remaining space of the Unified Buffer. |
minValue |
Output |
Minimum size of the temporary space required by WelfordFinalize computation. To ensure correct functions, the temporary space to be reserved or allocated during API computation cannot be less than the value of this parameter. If the minimum space size is 0, no temporary space is required. |
Returns
None
Restrictions
None
Example
- Add the WelfordFinalizeTiling struct parameter to the TilingData struct to function as a field.
1 2 3 4 5 6 7 8 9 10 11
BEGIN_TILING_DATA_DEF(WelfordFinalizeCustomTilingData) // Register a tiling class and use the tiling name as the input parameter. TILING_DATA_FIELD_DEF(uint32_t, isCounts); // Add the tiling field. TILING_DATA_FIELD_DEF(uint32_t, rnLength); TILING_DATA_FIELD_DEF(uint32_t, abLength); TILING_DATA_FIELD_DEF(uint32_t, rLength); TILING_DATA_FIELD_DEF(uint32_t, head); TILING_DATA_FIELD_DEF(uint32_t, headLength); TILING_DATA_FIELD_DEF(uint32_t, tail); TILING_DATA_FIELD_DEF(uint32_t, tailLength); END_TILING_DATA_DEF; REGISTER_TILING_DATA_CLASS(WelfordFinalizeCustom, WelfordFinalizeCustomTilingData) // Add the WelfordFinalizeCustomTilingData structure parameter to the TilingData structure.
- The tiling implementation function first calls the GetWelfordFinalizeMaxMinTmpSize API to obtain the maximum and minimum temporary space sizes required by the WelfordFinalize API to complete computation, sets an appropriate space size based on this range and the actual buffer usage, and then obtains the tiling parameters required by the WelfordFinalize kernel API based on the input shape and remaining size of computing space.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
namespace optiling { static ge::graphStatus TilingFunc(gert::TilingContext *context) { WelfordFinalizeCustomTilingData tiling; const gert::RuntimeAttrs *attrs = context->GetAttrs(); const uint32_t isCounts = *(attrs->GetAttrPointer<uint32_t>(0)); const uint32_t rnLength = *(attrs->GetAttrPointer<uint32_t>(1)); const uint32_t abLength = *(attrs->GetAttrPointer<uint32_t>(2)); const uint32_t rLength = *(attrs->GetAttrPointer<uint32_t>(3)); const uint32_t head = *(attrs->GetAttrPointer<uint32_t>(4)); const uint32_t headLength = *(attrs->GetAttrPointer<uint32_t>(5)); const uint32_t tail = *(attrs->GetAttrPointer<uint32_t>(6)); const uint32_t tailLength = *(attrs->GetAttrPointer<uint32_t>(7)); std::vector<int64_t> srcDims = {abLength}; ge::Shape srcShape(srcDims); // This example is only for reference. Use GetWelfordFinalizeMaxMinTmpSize to obtain the minimum value and pass it to ensure correct functionality. You can pass a proper space size as required. uint32_t maxTmpsize = 0; uint32_t minTmpsize = 0; AscendC::GetWelfordFinalizeMaxMinTmpSize(srcShape, 4, false, maxTmpsize, minTmpsize); ... // Other logic context->SetTilingKey(1); tiling.SaveToBuffer(context->GetRawTilingData()->GetData(), context->GetRawTilingData()->GetCapacity()); context->GetRawTilingData()->SetDataSize(tiling.GetDataSize()); size_t *currentWorkspace = context->GetWorkspaceSizes(1); currentWorkspace[0] = 0; return ge::GRAPH_SUCCESS; } } // namespace optiling
- The kernel calls GET_TILING_DATA in the kernel function to obtain tilingData, and then passes the WelfordFinalize tiling information in tilingData to the WelfordFinalize API for computation. For details about the complete example in the kernel, see WelfordFinalize.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
extern "C" __global__ __aicore__ void welford_finalize_custom( GM_ADDR inputX_gm, GM_ADDR mean_gm, GM_ADDR var_gm, GM_ADDR outputMean_gm, GM_ADDR outputVariance_gm, GM_ADDR workspace, GM_ADDR tiling) { GET_TILING_DATA(tilingData, tiling); if (TILING_KEY_IS(1)) { if (tilingData.isCounts) { KernelWelfordFinalize<int32_t, true> op; op.Init(inputX_gm, mean_gm, var_gm, outputMean_gm, outputVariance_gm, tilingData.rnLength, tilingData.abLength, tilingData.rLength, tilingData.head, tilingData.headLength, tilingData.tail, tilingData.tailLength); op.Process(); } else { KernelWelfordFinalize<int32_t, false> op; op.Init(inputX_gm, mean_gm, var_gm, outputMean_gm, outputVariance_gm, tilingData.rnLength, tilingData.abLength, tilingData.rLength, tilingData.head, tilingData.headLength, tilingData.tail, tilingData.tailLength); op.Process(); } } }