ConfusionTranspose Tiling

Function Usage

Obtains the ConfusionTranspose tiling parameter.

Prototype

1
void GetConfusionTransposeMaxMinTmpSize(const ge::Shape &srcShape, const uint32_t typeSize, const uint32_t transposeTypeIn, uint32_t &maxValue, uint32_t &minValue)
1
void GetConfusionTransposeTilingInfo(const ge::Shape &srcShape, const uint32_t stackBufferSize, const uint32_t typeSize, const uint32_t transposeTypeIn, optiling::ConfusionTransposeTiling &tiling)

Parameters

Table 1 GetConfusionTransposeMaxMinTmpSize API parameters

Parameter

Input/Output

Meaning

srcShape

Input

Shape of the input tensor. The specific input format of srcShape is as follows:

Scenario 1: [B, N, S, H/N]

Scenario 2: [B, N, S, H/N]

Scenario 3: [B, N, S, H/N]

Scenario 4: [B, N, S, H/N]

Scenario 5: [B, N, S, H/N]

Scenario 6: [B, N, S, H/N]

Scenario 7: [H, W]

typeSize

Input

Data type size of operator inputs. The unit is byte. For example, if the data type of operator inputs is half, set this parameter to 2.

maxValue

Output

Maximum size of the temporary space required by ConfusionTranspose computation. Any space exceeding this value will not be utilized by the API. Within the range between the minimum and maximum, as the temporary space increases, the API computing performance in the kernel can be optimized to some extent. To achieve better performance, reserve or allocate the space based on the actual buffer usage.

NOTE:

maxValue is for reference only and may be larger than the available space of the Unified Buffer. In this case, select a proper temporary space size based on the remaining space of the Unified Buffer.

minValue

Output

Minimum size of the temporary space required by ConfusionTranspose computation. To ensure correct functions, the size of the temporary space to be reserved or allocated during API computation cannot be less than the value of this parameter.

Table 2 GetConfusionTransposeTilingInfo API parameters

API

Input/Output

Function

srcShape

Input

Shape of the input. The specific input format of srcShape is as follows:

Scenario 1: [B, N, S, H/N]

Scenario 2: [B, N, S, H/N]

Scenario 3: [B, N, S, H/N]

Scenario 4: [B, N, S, H/N]

Scenario 5: [B, N, S, H/N]

Scenario 6: [B, N, S, H/N]

Scenario 7: [H, W]

stackBufferSize

Input

Size of the space required for ConfusionTranspose computation. The unit is byte.

typeSize

Input

Data type size of operator inputs. The unit is byte. For example, if the data type of operator inputs is half, set this parameter to 2.

transposeTypeIn

Input

Data layout and reshape type. Select the corresponding scenario based on the input number. The value range is [1, 7].

Scenario 1: 1

Scenario 2: 2

Scenario 3: 3

Scenario 4: 4

Scenario 5: 5

Scenario 6: 6

Scenario 7: 7

tilling

Output

Tilling information of input data.

Returns

None

Examples

The following example describes the process of obtaining the tiling parameter on the host and the method of using the parameter in the kernel when ConfusionTranspose high-level APIs are used. This example is used in scenario 1. The shape of the input tensor is [1, 2, 64, 32], and the input data type is half.

  1. Add the ConfusionTransposeTiling structure parameter to the TilingData structure to function as a field.
    1
    2
    3
    4
    5
    BEGIN_TILING_DATA_DEF(TilingData)               // Register a tiling class and uses the tiling name as the input parameter.
      TILING_DATA_FIELD_DEF(uint32_t, tileNum);     // Add the tiling field that specifies the total number of data blocks to be computed on each core.
      ...                                           // Add other tiling fields.
      TILING_DATA_FIELD_DEF_STRUCT(ConfusionTransposeTiling, confusionTransposeTilingData); // Add the ConfusionTransposeTiling structure parameter to the TilingData structure.
    END_TILING_DATA_DEF;
    
  2. In the tiling implementation function, obtain the tiling parameter required by the ConfusionTranspose kernel API based on the input shape and the space size (stackBufferSize) required for computation.
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    namespace optiling {
    const uint32_t BLOCK_DIM = 8;
    const uint32_t TILE_NUM = 8;
    static ge::graphStatus TilingFunc(gert::TilingContext* context)
    {
        TilingData tiling;
        uint32_t totalLength = context->GetInputTensor(0)->GetShapeSize();
        context->SetBlockDim(BLOCK_DIM);
        tiling.set_tileNum(TILE_NUM);
        // Set other tiling parameters.
        ... 
        std::vector<int64_t> shapeVec = {1, 2, 64, 32};
        ge::Shape srcShape(shapeVec);
    
        uint32_t maxValue = 0;
        uint32_t minValue = 0;
        AscendC::GetConfusionTransposeMaxMinTmpSize(srcShape, sizeof(half), maxValue, minValue);
    // This example is used only as an example. Obtain the minimum value and pass it to ensure that the function is correct. Developers can pass a proper space size as required.
        const uint32_t stackBufferSize = minValue;
        // Obtain the ConfusionTranspose tiling parameter.
        AscendC::GetConfusionTransposeTilingInfo(srcShape, stackBufferSize, sizeof(half), 1, tiling.confusionTransposeTilingData); 
         ... // Other logic
        tiling.SaveToBuffer(context->GetRawTilingData()->GetData(), context->GetRawTilingData()->GetCapacity());
        context->GetRawTilingData()->SetDataSize(tiling.GetDataSize());
        context->SetTilingKey(1);
        return ge::GRAPH_SUCCESS;
    }
    } // namespace optiling
    
  3. The kernel calls GET_TILING_DATA in the kernel function to obtain TilingData, and then passes the ConfusionTransposeTiling information in TilingData to the ConfusionTranspose API for computation. For details about the complete example in the kernel, see ConfusionTranspose.
    1
    2
    3
    4
    5
    6
    7
    extern "C" __global__ __aicore__ void  func_custom(GM_ADDR src_gm, GM_ADDR dst_gm, GM_ADDR workspace, GM_ADDR tiling)                     
    {   
        GET_TILING_DATA(TilingData, tiling);                                                                                      
        KernelConfusionTranspose<half> op;                                         
        op.Init(src_gm, dst_gm, TilingData.confusionTransposeTilingData); 
        op.Process();                                                                                
    }