Transpose Tiling
Function
Obtains the tiling parameters required by Transpose Tiling.
Prototype
The GetConfusionTransposeMaxMinTmpSize, GetConfusionTransposeTilingInfo, and GetConfusionTransposeOnlyTilingInfo API has been deprecated and will be removed in later versions. Do not use this API. Use the GetTransposeMaxMinTmpSize and GetTransposeTilingInfo API instead.
- Obtain the minimum temporary space size.
1void GetTransposeMaxMinTmpSize(const ge::Shape& srcShape, const uint32_t typeSize, const uint32_t transposeTypeIn, uint32_t& maxValue, uint32_t& minValue)
1void GetConfusionTransposeMaxMinTmpSize(const ge::Shape& srcShape, const uint32_t typeSize, const uint32_t transposeTypeIn, uint32_t& maxValue, uint32_t& minValue)
- Obtain Transpose Tiling.
1void GetTransposeTilingInfo(const ge::Shape& srcShape, const uint32_t stackBufferSize, const uint32_t typeSize, const uint32_t transposeTypeIn, optiling::ConfusionTransposeTiling& tiling)
1void GetTransposeTilingInfo(const ge::Shape& srcShape, const uint32_t stackBufferSize, const uint32_t typeSize, const uint32_t transposeTypeIn, AscendC::tiling::ConfusionTransposeTiling& tiling)
1void GetConfusionTransposeOnlyTilingInfo(const ge::Shape& srcShape, const uint32_t stackBufferSize, const uint32_t typeSize, optiling::ConfusionTransposeTiling& tiling)
1void GetConfusionTransposeOnlyTilingInfo(const ge::Shape& srcShape, const uint32_t stackBufferSize, const uint32_t typeSize, AscendC::tiling::ConfusionTransposeTiling& tiling)
1void GetConfusionTransposeTilingInfo(const ge::Shape& srcShape, const uint32_t stackBufferSize, const uint32_t typeSize, const uint32_t transposeTypeIn, optiling::ConfusionTransposeTiling& tiling)
1void GetConfusionTransposeTilingInfo(const ge::Shape& srcShape, const uint32_t stackBufferSize, const uint32_t typeSize, const uint32_t transposeTypeIn, AscendC::tiling::ConfusionTransposeTiling& tiling)
Parameters
Parameter |
Input/Output |
Description |
|---|---|---|
srcShape |
Input |
Shape of the input tensor. The specific input format of srcShape is as follows: Scenario 1: [B, N, S, H/N] Scenario 2: [B, N, S, H/N] Scenario 3: [B, N, S, H/N] Scenario 4: [B, N, S, H/N] Scenario 5: [B, N, S, H/N] Scenario 6: [B, N, S, H/N] Scenario 7: [H, W] |
typeSize |
Input |
Size of the input data type, in bytes. For example, if the input data type is half, set this parameter to 2. |
transposeTypeIn |
Input |
Select the data layout and reshape type. Select a scenario based on the input number. The value range is [1, 7]. 1: scenario 1 (NZ2ND, axis 1 and axis 2 interchanged) 2: scenario 2 (NZ2NZ, axis 1 and axis 2 interchanged) 3: scenario 3 (NZ2NZ, split of the last axis) 4: scenario 4 (NZ2ND, split of the last axis) 5: scenario 5 (NZ2ND, merge of the last axis) 6: scenario 6 (NZ2NZ, merge of the last axis) 7: scenario 7 (2D tensor transpose) |
maxValue |
Output |
Maximum size of the temporary space required by Transpose computation. Any space exceeding this value will not be utilized by the API. Within the range between the minimum and maximum, as the temporary space increases, the API computing performance in the kernel can be optimized to some extent. To achieve better performance, reserve or allocate the space based on the actual buffer usage. NOTE:
maxValue is for reference only and may be larger than the remaining space of the Unified Buffer. In this case, select a proper temporary space size based on the remaining space of the Unified Buffer. |
minValue |
Output |
Minimum size of the temporary space required for computation by the Transpose API. To ensure correct functions, the temporary space to be reserved or applied for during API computation cannot be less than the parameter value. |
Parameter |
Input/Output |
Description |
|---|---|---|
srcShape |
Input |
Shape of the input. The specific input format of srcShape is as follows: Scenario 1: [B, N, S, H/N] Scenario 2: [B, N, S, H/N] Scenario 3: [B, N, S, H/N] Scenario 4: [B, N, S, H/N] Scenario 5: [B, N, S, H/N] Scenario 6: [B, N, S, H/N] Scenario 7: [H, W] |
stackBufferSize |
Input |
Size of the space required for Transpose computation. The unit is byte. |
typeSize |
Input |
Size of the input data type, in bytes. For example, if the input data type is half, set this parameter to 2. |
transposeTypeIn |
Input |
Select the data layout and reshape type based on the input number. The value range is [1, 7]. 1: scenario 1 (NZ2ND, axis 1 and axis 2 interchanged) 2: scenario 2 (NZ2NZ, axis 1 and axis 2 interchanged) 3: scenario 3 (NZ2NZ, split of the last axis) 4: scenario 4 (NZ2ND, split of the last axis) 5: scenario 5 (NZ2ND, merge of the last axis) 6: scenario 6 (NZ2NZ, merge of the last axis) 7: scenario 7 (2D tensor transpose) |
tiling |
Output |
Tilling information of input data. |
Returns
None
Restrictions
None
Example
The following example describes the process of obtaining the tiling parameters on the host and the method of using the parameters on the kernel when Transpose high-level APIs are used. This example is used in scenario 1. The shape of the input Tensor is [1, 2, 64, 32], and the input data type is half.
- Add the ConfusionTransposeTiling structure parameter to the TilingData structure to function as a field.
1 2 3 4 5
BEGIN_TILING_DATA_DEF(TilingData) // Register a tiling class and use the tiling name as the input parameter. TILING_DATA_FIELD_DEF(uint32_t, tileNum); // Add the tiling field that specifies the total number of data blocks to be computed on each core. ... // Add other tiling fields. TILING_DATA_FIELD_DEF_STRUCT(ConfusionTransposeTiling, confusionTransposeTilingData); // Add the ConfusionTransposeTiling structure parameter to the TilingData structure. END_TILING_DATA_DEF;
- In the Tiling implementation function, obtain the Tiling parameter required by the Transpose kernel API based on the input shape and the space size (stackBufferSize) required for computation.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
namespace optiling { const uint32_t BLOCK_DIM = 8; const uint32_t TILE_NUM = 8; static ge::graphStatus TilingFunc(gert::TilingContext* context) { TilingData tiling; uint32_t totalLength = context->GetInputTensor(0)->GetShapeSize(); context->SetBlockDim(BLOCK_DIM); tiling.set_tileNum(TILE_NUM); // Set other Tiling parameters. ... std::vector<int64_t> shapeVec = {1, 2, 64, 32}; ge::Shape srcShape(shapeVec); uint32_t transposeTypeIn = 1; uint32_t maxValue = 0; uint32_t minValue = 0; AscendC::GetTransposeMaxMinTmpSize(srcShape, sizeof(half), transposeTypeIn, maxValue, minValue); // This example is for reference only. Obtain the minimum value and pass it to ensure correct functionality. You can pass a proper space size as required. const uint32_t stackBufferSize = minValue; // Obtain the Transpose Tiling parameters. AscendC::GetTransposeTilingInfo(srcShape, stackBufferSize, sizeof(half), transposeTypeIn, tiling.confusionTransposeTilingData); ... // Other logic tiling.SaveToBuffer(context->GetRawTilingData()->GetData(), context->GetRawTilingData()->GetCapacity()); context->GetRawTilingData()->SetDataSize(tiling.GetDataSize()); context->SetTilingKey(1); return ge::GRAPH_SUCCESS; } } // namespace optiling
- The kernel calls GET_TILING_DATA in the kernel function to obtain TilingData, and then passes the ConfusionTransposeTiling information in TilingData to the Transpose API for computation. For details about the complete example in the kernel, see Transpose.
1 2 3 4 5 6 7
extern "C" __global__ __aicore__ void func_custom(GM_ADDR src_gm, GM_ADDR dst_gm, GM_ADDR workspace, GM_ADDR tiling) { GET_TILING_DATA(TilingData, tiling); KernelTranspose<half> op; op.Init(src_gm, dst_gm, TilingData.confusionTransposeTilingData); op.Process(); }