Transpose Tiling

Function

Obtains the tiling parameters required by Transpose Tiling.

Prototype

The GetConfusionTransposeMaxMinTmpSize, GetConfusionTransposeTilingInfo, and GetConfusionTransposeOnlyTilingInfo API has been deprecated and will be removed in later versions. Do not use this API. Use the GetTransposeMaxMinTmpSize and GetTransposeTilingInfo API instead.

  • Obtain the minimum temporary space size.
    1
    void GetTransposeMaxMinTmpSize(const ge::Shape& srcShape, const uint32_t typeSize, const uint32_t transposeTypeIn, uint32_t& maxValue, uint32_t& minValue)
    
    1
    void GetConfusionTransposeMaxMinTmpSize(const ge::Shape& srcShape, const uint32_t typeSize, const uint32_t transposeTypeIn, uint32_t& maxValue, uint32_t& minValue)
    
  • Obtain Transpose Tiling.
    1
    void GetTransposeTilingInfo(const ge::Shape& srcShape, const uint32_t stackBufferSize, const uint32_t typeSize, const uint32_t transposeTypeIn, optiling::ConfusionTransposeTiling& tiling)
    
    1
    void GetTransposeTilingInfo(const ge::Shape& srcShape, const uint32_t stackBufferSize, const uint32_t typeSize, const uint32_t transposeTypeIn, AscendC::tiling::ConfusionTransposeTiling& tiling)
    
    1
    void GetConfusionTransposeOnlyTilingInfo(const ge::Shape& srcShape, const uint32_t stackBufferSize, const uint32_t typeSize, optiling::ConfusionTransposeTiling& tiling)
    
    1
    void GetConfusionTransposeOnlyTilingInfo(const ge::Shape& srcShape, const uint32_t stackBufferSize, const uint32_t typeSize, AscendC::tiling::ConfusionTransposeTiling& tiling)
    
    1
    void GetConfusionTransposeTilingInfo(const ge::Shape& srcShape, const uint32_t stackBufferSize, const uint32_t typeSize, const uint32_t transposeTypeIn, optiling::ConfusionTransposeTiling& tiling)
    
    1
    void GetConfusionTransposeTilingInfo(const ge::Shape& srcShape, const uint32_t stackBufferSize, const uint32_t typeSize, const uint32_t transposeTypeIn, AscendC::tiling::ConfusionTransposeTiling& tiling)
    

Parameters

Table 1 GetTransposeMaxMinTmpSize API parameters

Parameter

Input/Output

Description

srcShape

Input

Shape of the input tensor. The specific input format of srcShape is as follows:

Scenario 1: [B, N, S, H/N]

Scenario 2: [B, N, S, H/N]

Scenario 3: [B, N, S, H/N]

Scenario 4: [B, N, S, H/N]

Scenario 5: [B, N, S, H/N]

Scenario 6: [B, N, S, H/N]

Scenario 7: [H, W]

typeSize

Input

Size of the input data type, in bytes. For example, if the input data type is half, set this parameter to 2.

transposeTypeIn

Input

Select the data layout and reshape type. Select a scenario based on the input number. The value range is [1, 7].

1: scenario 1 (NZ2ND, axis 1 and axis 2 interchanged)

2: scenario 2 (NZ2NZ, axis 1 and axis 2 interchanged)

3: scenario 3 (NZ2NZ, split of the last axis)

4: scenario 4 (NZ2ND, split of the last axis)

5: scenario 5 (NZ2ND, merge of the last axis)

6: scenario 6 (NZ2NZ, merge of the last axis)

7: scenario 7 (2D tensor transpose)

maxValue

Output

Maximum size of the temporary space required by Transpose computation. Any space exceeding this value will not be utilized by the API. Within the range between the minimum and maximum, as the temporary space increases, the API computing performance in the kernel can be optimized to some extent. To achieve better performance, reserve or allocate the space based on the actual buffer usage.

NOTE:

maxValue is for reference only and may be larger than the remaining space of the Unified Buffer. In this case, select a proper temporary space size based on the remaining space of the Unified Buffer.

minValue

Output

Minimum size of the temporary space required for computation by the Transpose API. To ensure correct functions, the temporary space to be reserved or applied for during API computation cannot be less than the parameter value.

Table 2 GetTransposeTilingInfo API parameters

Parameter

Input/Output

Description

srcShape

Input

Shape of the input. The specific input format of srcShape is as follows:

Scenario 1: [B, N, S, H/N]

Scenario 2: [B, N, S, H/N]

Scenario 3: [B, N, S, H/N]

Scenario 4: [B, N, S, H/N]

Scenario 5: [B, N, S, H/N]

Scenario 6: [B, N, S, H/N]

Scenario 7: [H, W]

stackBufferSize

Input

Size of the space required for Transpose computation. The unit is byte.

typeSize

Input

Size of the input data type, in bytes. For example, if the input data type is half, set this parameter to 2.

transposeTypeIn

Input

Select the data layout and reshape type based on the input number. The value range is [1, 7].

1: scenario 1 (NZ2ND, axis 1 and axis 2 interchanged)

2: scenario 2 (NZ2NZ, axis 1 and axis 2 interchanged)

3: scenario 3 (NZ2NZ, split of the last axis)

4: scenario 4 (NZ2ND, split of the last axis)

5: scenario 5 (NZ2ND, merge of the last axis)

6: scenario 6 (NZ2NZ, merge of the last axis)

7: scenario 7 (2D tensor transpose)

tiling

Output

Tilling information of input data.

Returns

None

Restrictions

None

Example

The following example describes the process of obtaining the tiling parameters on the host and the method of using the parameters on the kernel when Transpose high-level APIs are used. This example is used in scenario 1. The shape of the input Tensor is [1, 2, 64, 32], and the input data type is half.

  1. Add the ConfusionTransposeTiling structure parameter to the TilingData structure to function as a field.
    1
    2
    3
    4
    5
    BEGIN_TILING_DATA_DEF(TilingData)               // Register a tiling class and use the tiling name as the input parameter.
      TILING_DATA_FIELD_DEF(uint32_t, tileNum);     // Add the tiling field that specifies the total number of data blocks to be computed on each core.
      ...                                           // Add other tiling fields.
      TILING_DATA_FIELD_DEF_STRUCT(ConfusionTransposeTiling, confusionTransposeTilingData); // Add the ConfusionTransposeTiling structure parameter to the TilingData structure.
    END_TILING_DATA_DEF;
    
  2. In the Tiling implementation function, obtain the Tiling parameter required by the Transpose kernel API based on the input shape and the space size (stackBufferSize) required for computation.
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    namespace optiling {
    const uint32_t BLOCK_DIM = 8;
    const uint32_t TILE_NUM = 8;
    static ge::graphStatus TilingFunc(gert::TilingContext* context)
    {
        TilingData tiling;
        uint32_t totalLength = context->GetInputTensor(0)->GetShapeSize();
        context->SetBlockDim(BLOCK_DIM);
        tiling.set_tileNum(TILE_NUM);
        // Set other Tiling parameters.
        ... 
        std::vector<int64_t> shapeVec = {1, 2, 64, 32};
        ge::Shape srcShape(shapeVec);
    
        uint32_t transposeTypeIn = 1;
        uint32_t maxValue = 0;
        uint32_t minValue = 0;
        AscendC::GetTransposeMaxMinTmpSize(srcShape, sizeof(half), transposeTypeIn, maxValue, minValue);
        // This example is for reference only. Obtain the minimum value and pass it to ensure correct functionality. You can pass a proper space size as required.
        const uint32_t stackBufferSize = minValue;
        // Obtain the Transpose Tiling parameters.
        AscendC::GetTransposeTilingInfo(srcShape, stackBufferSize, sizeof(half), transposeTypeIn, tiling.confusionTransposeTilingData); 
         ... // Other logic
        tiling.SaveToBuffer(context->GetRawTilingData()->GetData(), context->GetRawTilingData()->GetCapacity());
        context->GetRawTilingData()->SetDataSize(tiling.GetDataSize());
        context->SetTilingKey(1);
        return ge::GRAPH_SUCCESS;
    }
    } // namespace optiling
    
  3. The kernel calls GET_TILING_DATA in the kernel function to obtain TilingData, and then passes the ConfusionTransposeTiling information in TilingData to the Transpose API for computation. For details about the complete example in the kernel, see Transpose.
    1
    2
    3
    4
    5
    6
    7
    extern "C" __global__ __aicore__ void  func_custom(GM_ADDR src_gm, GM_ADDR dst_gm, GM_ADDR workspace, GM_ADDR tiling)                     
    {   
        GET_TILING_DATA(TilingData, tiling);                                                                                      
        KernelTranspose<half> op;                                         
        op.Init(src_gm, dst_gm, TilingData.confusionTransposeTilingData); 
        op.Process();                                                                                
    }