LayerNorm Tiling
Function
Ascend C provides a group of LayerNorm tiling APIs for users to obtain the tiling parameters required for LayerNorm kernel computation.
To obtain the tiling parameters, perform the following two steps:
- Use the GetLayerNormMaxMinTmpSize API to obtain the maximum and minimum temporary space sizes required for LayerNorm computation. This is to ensure the reasonable allocation of computing space.To perform LayerNorm computation in the kernel, you need to reserve or allocate the temporary space. The GetLayerNormMaxMinTmpSize API is used to obtain the maximum and minimum temporary space sizes to be reserved or allocated on the host. You can select proper sizes within this range as the tiling parameters and pass them to the kernel.
- To ensure correct functions, the temporary space to be reserved or allocated cannot be less than the minimum temporary space.
- Within the range between the minimum and maximum, as the temporary space increases, the API computing performance in the kernel can be optimized to some extent. For better performance, reserve or allocate the temporary space based on the actual memory usage.
- Use the GetLayerNormNDTilingInfo API to obtain the tiling parameters required by the LayerNorm kernel API, and pass the input shape, available space for LayerNorm computation, and computation data types.
Below is the definition of the LayerNorm tiling structure. You do not need to pay attention to the specific information of this tiling structure. They only need to pass it to the kernel and directly use it through LayerNorm high-level APIs.
- Tiling structure required by the LayerNorm APIs for outputting the normalization result, mean, and variance
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
struct LayerNormTiling { uint32_t bLength = 0; uint32_t sLength = 0; uint32_t hLength = 0; uint32_t originalHLength = 0; uint32_t inputXSize = 0; uint32_t meanVarSize = 0; uint32_t numberOfTmpBuf = 0; uint32_t meanTmpTensorPos = 0; uint32_t meanTmpTensorSize = 0; uint32_t varianceTmpTensorPos = 0; uint32_t varianceTmpTensorSize = 0; uint32_t tmpBufSize = 0; uint32_t oneTmpSize = 0; uint32_t firstTmpStartPos = 0; uint32_t secondTmpStartPos = 0; uint32_t thirdTmpStartPos = 0; uint32_t loopRound = 0; uint32_t inputRoundSize = 0; uint32_t inputTailSize = 0; uint32_t inputTailPos = 0; uint32_t meanVarRoundSize = 0; uint32_t meanVarTailSize = 0; uint32_t meanVarTailPos = 0; uint32_t bshCurLength = 0; uint32_t bsCurLength = 0; float lastDimValueBack = 0.0; };
- Tiling structure required by the LayerNorm APIs for outputting the normalization result, mean, and reciprocal of the standard deviation
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
struct LayerNormSeparateTiling{ uint32_t aLength = 0; uint32_t rLength = 0; uint32_t halfAddRepeatTimes = 0; uint32_t rHeadLength = 0; float k2Rec = 0; float k2RRec = 0; uint32_t inputXSize = 0; uint32_t meanVarSize = 0; uint32_t numberOfTmpBuf = 0; uint32_t varianceTmpTensorPos = 0; uint32_t varianceTmpTensorSize = 0; uint32_t tmpBufSize = 0; uint32_t oneTmpSize = 0; uint32_t firstTmpStartPos = 0; uint32_t secondTmpStartPos = 0; uint32_t thirdTmpStartPos = 0; uint32_t loopRound = 0; uint32_t inputRoundSize = 0; uint32_t inputTailSize = 0; uint32_t inputTailPos = 0; uint32_t meanVarRoundSize = 0; uint32_t meanVarTailSize = 0; uint32_t meanVarTailPos = 0; uint32_t arCurLength = 0; uint32_t aCurLength = 0; float rValueBack = 0; };
- Tiling structure required by the LayerNorm APIs for outputting the normalization result, mean, and variance
Prototype
The GetLayerNormNDTillingInfo API has been deprecated and will be removed in later versions. Do not use this API. Use the GetLayerNormNDTilingInfo API instead.
- GetLayerNormMaxMinTmpSize
- Temporary space required by the LayerNorm API for outputting the normalization result, mean, and variance
1void GetLayerNormMaxMinTmpSize(const ge::Shape& srcShape, const uint32_t typeSize, const bool isReuseSource, uint32_t& maxValue, uint32_t& minValue)
- Temporary space required by the LayerNorm API for outputting the normalization result, mean, and reciprocal of the standard deviation
1void GetLayerNormMaxMinTmpSize(const ge::Shape& srcShape, const uint32_t typeSize, const bool isReuseSource, const bool isComputeRstd, const bool isOnlyOutput, uint32_t& maxValue, uint32_t& minValue)
- Temporary space required by the LayerNorm API for outputting the normalization result, mean, and variance
- GetLayerNormNDTilingInfo or GetLayerNormNDTillingInfo
- Tiling parameters required by the LayerNorm APIs for outputting the normalization result, mean, and variance
1void GetLayerNormNDTilingInfo(const ge::Shape& srcShape, const uint32_t stackBufferSize, const uint32_t typeSize, const bool isReuseSource, optiling::LayerNormTiling& tiling)
1void GetLayerNormNDTilingInfo(const ge::Shape& srcShape, const uint32_t stackBufferSize, const uint32_t typeSize, const bool isReuseSource, AscendC::tiling::LayerNormTiling& tiling)
- (Not recommended) Tiling parameters required by the LayerNorm APIs for outputting the normalization result, mean, and variance
1void GetLayerNormNDTillingInfo(const ge::Shape& srcShape, const uint32_t stackBufferSize, const uint32_t typeSize, const bool isReuseSource, optiling::LayerNormTiling& tilling)
- Tiling parameters required by the LayerNorm APIs for outputting the normalization result, mean, and reciprocal of the standard deviation
1void GetLayerNormNDTilingInfo(const ge::Shape& srcShape, const uint32_t stackBufferSize, const uint32_t typeSize, const bool isReuseSource, const bool isComputeRstd, optiling::LayerNormSeparateTiling& tiling)
1void GetLayerNormNDTilingInfo(const ge::Shape& srcShape, const uint32_t stackBufferSize, const uint32_t typeSize, const bool isReuseSource, const bool isComputeRstd, AscendC::tiling::LayerNormSeparateTiling& tiling)
- Tiling parameters required by the LayerNorm APIs for outputting the normalization result, mean, and variance
Parameters
Parameter |
Input/Output |
Description |
|---|---|---|
srcShape |
Input |
|
typeSize |
Input |
Data type size of the input data inputX. The unit is byte. For example, if the input data type is half, set this parameter to 2. |
isReuseSource |
Input |
Whether to reuse the buffer space of the source operand, which must be the same as that of the LayerNorm API. |
isComputeRstd |
Input |
Whether to calculate the reciprocal rstd of the standard deviation. This parameter is used to distinguish the selected LayerNorm API in tiling. |
isOnlyOutput |
Input |
Whether to output only y but not the mean and the reciprocal rstd of standard deviation. Currently, this parameter can only be set to false. The y, mean, and rstd results are all output. |
maxValue |
Output |
Tiling information (maximum temporary space size) required by the LayerNorm API. Maximum size of the temporary space required by LayerNorm computation. Any space exceeding this value will not be utilized by the API. Within the range between the minimum and maximum, as the temporary space increases, the API computing performance in the kernel can be optimized to some extent. For better performance, reserve or allocate the temporary space based on the actual memory usage. NOTE:
maxValue is for reference only and may be larger than the remaining space of the Unified Buffer. In this case, select a proper temporary space size based on the remaining space of the Unified Buffer. |
minValue |
Output |
Tiling information (minimum temporary space size) required by the LayerNorm API. Minimum size of the temporary space required by LayerNorm computation. To ensure correct functions, the temporary space to be reserved or allocated during API computation cannot be less than the value of this parameter. |
Returns
None
Restrictions
None
Example
The following example describes the process of obtaining the tiling parameters on the host and the method of using the parameter in the kernel when LayerNorm high-level API for output variance is used. In this example, the shape size of the input tensor is [2, 16, 64], and the input data type is half.
- Add the LayerNormTiling structure parameter to the TilingData structure to function as a field.
1 2 3 4 5 6
BEGIN_TILING_DATA_DEF(TilingData) // Register a tiling class and use the tiling name as the input parameter. TILING_DATA_FIELD_DEF(uint32_t, totalLength); // Add the tiling field to compute the total size of data. TILING_DATA_FIELD_DEF(uint32_t, tileNum); // Add the tiling field that specifies the total number of data blocks to be computed on each core. ... // Add other tiling fields. TILING_DATA_FIELD_DEF_STRUCT(LayerNormTiling, layernormTilingData); // Add the LayerNormTiling struct parameter to the TilingData struct. END_TILING_DATA_DEF;
- The tiling implementation function first calls the GetLayerNormMaxMinTmpSize API to obtain the maximum and minimum temporary space sizes required by the LayerNorm API to complete computation, sets an appropriate space size based on this range and the actual buffer usage, and then calls the GetLayerNormNDTilingInfo API to obtain the tiling parameters required by the LayerNorm kernel API based on the input shape and available size of computing space.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
namespace optiling { const uint32_t BLOCK_DIM = 8; const uint32_t TILE_NUM = 8; static ge::graphStatus TilingFunc(gert::TilingContext* context) { TilingData tiling; uint32_t totalLength = context->GetInputTensor(0)->GetShapeSize(); context->SetBlockDim(BLOCK_DIM); tiling.set_totalLength(totalLength); tiling.set_tileNum(TILE_NUM); // Set other tiling parameters. ... // {B, S, storageHLength, originHLength} std::vector<int64_t> shapeVec = {2, 16, 64, 64}; ge::Shape srcShape(shapeVec); // This example is for reference only. Use GetLayerNormMaxMinTmpSize to obtain the minimum value and pass it to ensure correct functionality. You can pass a proper space size as required. uint32_t max; uint32_t min; AscendC::GetLayerNormMaxMinTmpSize(srcShape, sizeof(half), false, max, min); // Obtain the LayerNorm tiling parameters. AscendC::GetLayerNormNDTilingInfo(srcShape, min, sizeof(half), false, tiling.layernormTilingData); ... // Other logic tiling.SaveToBuffer(context->GetRawTilingData()->GetData(), context->GetRawTilingData()->GetCapacity()); context->GetRawTilingData()->SetDataSize(tiling.GetDataSize()); context->SetTilingKey(1); return ge::GRAPH_SUCCESS; } } // namespace optiling
- The kernel calls GET_TILING_DATA in the kernel function to obtain tilingData, and then passes the LayerNorm tiling information in tilingData to the LayerNorm APIs for computation. For details about the complete kernel sample, see LayerNorm.
1 2 3 4 5 6 7 8 9
extern "C" __global__ __aicore__ void func_custom(GM_ADDR x, GM_ADDR y, GM_ADDR z, GM_ADDR workspace, GM_ADDR tiling) { GET_TILING_DATA(tilingData, tiling); KernelFunc op; op.Init(x, y, z, tilingData.totalLength, tilingData.tileNum,tilingData.layernormTilingData); if (TILING_KEY_IS(1)) { op.Process(); } }
The following example describes the process of obtaining the tiling parameters on the host and the method of using the parameter in the kernel when LayerNorm high-level APIs for the reciprocal of the output standard deviation is used. In this example, the shape size of the input tensor is [2, 64], and the input data type is half.
- Add the LayerNormTiling structure parameter to the TilingData structure to function as a field.
1 2 3 4 5 6
BEGIN_TILING_DATA_DEF(TilingData) // Register a tiling class and use the tiling name as the input parameter. TILING_DATA_FIELD_DEF(uint32_t, aLength); // Add the tiling field to specify the length of the a axis. TILING_DATA_FIELD_DEF(uint32_t, rLengthWithPadding); // Add the tiling field to specify the length of the r axis after 32-byte alignment. ... // Add other tiling fields. TILING_DATA_FIELD_DEF_STRUCT(LayerNormSeparateTiling, layernormTilingData); // Add the LayerNormSeparateTiling struct parameter to the TilingData struct. END_TILING_DATA_DEF;
- The tiling implementation function first calls the GetLayerNormMaxMinTmpSize API to obtain the maximum and minimum temporary space sizes required by the LayerNorm API to complete computation, sets an appropriate space size based on this range and the actual buffer usage, and then calls the GetLayerNormNDTilingInfo API to obtain the tiling parameters required by the LayerNorm kernel API based on the input shape and available size of computing space.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
namespace optiling { const uint32_t BLOCK_DIM = 1; const uint32_t TILE_NUM = 8; static ge::graphStatus TilingFunc(gert::TilingContext* context) { TilingData tiling; uint32_t totalLength = context->GetInputTensor(0)->GetShapeSize(); context->SetBlockDim(BLOCK_DIM); tiling.set_totalLength(totalLength); tiling.set_tileNum(TILE_NUM); // Set other tiling parameters. ... // {A, R} std::vector<int64_t> shapeVec = {2, 64}; ge::Shape srcShape(shapeVec); // This example is for reference only. Use GetLayerNormMaxMinTmpSize to obtain the minimum value and pass it to ensure correct functionality. You can pass a proper space size as required. uint32_t max; uint32_t min; AscendC::GetLayerNormMaxMinTmpSize(srcShape, sizeof(half), false, true, false, max, min); // Obtain the LayerNorm tiling parameters. AscendC::GetLayerNormNDTilingInfo(srcShape, min, sizeof(half), false, true, tiling.layernormTilingData); ... // Other logic tiling.SaveToBuffer(context->GetRawTilingData()->GetData(), context->GetRawTilingData()->GetCapacity()); context->GetRawTilingData()->SetDataSize(tiling.GetDataSize()); context->SetTilingKey(1); return ge::GRAPH_SUCCESS; } } // namespace optiling
- The kernel calls GET_TILING_DATA in the kernel function to obtain tilingData, and then passes the LayerNorm tiling information in tilingData to the LayerNorm APIs for computation. For details about the complete kernel sample, see LayerNorm.
1 2 3 4 5 6 7 8 9 10 11
extern "C" __global__ __aicore__ void func_custom(GM_ADDR x, GM_ADDR gamma, GM_ADDR beta, GM_ADDR mean, GM_ADDR rstd, GM_ADDR y, GM_ADDR workspace, GM_ADDR tiling) { GET_TILING_DATA(tilingData, tiling); float epsilon = tilingData.epsilon; AscendC::LayerNormPara para(tilingData.aLength, tilingData.rLengthWithPadding); KernelFunc op; op.Init(x, gamma, beta, mean, rstd, y, epsilon, para, tilingData.layernormTilingData); if (TILING_KEY_IS(1)) { op.Process(); } }