SoftMax Tiling
Ascend C provides a group of SoftMax tiling APIs for users to obtain tiling parameters required for SoftMax kernel computation. Before reading this section, refer to Tiling Implementation to learn the basic tiling process.
To obtain tiling parameters, perform the following two steps:
- Obtain the minimum and maximum temporary space sizes required for SoftMax API computation. Note that this step is not mandatory and only serves as a reference for appropriately allocating computing space.
- Obtain the tiling parameters required by the SoftMax kernel APIs, and pass the input shapes, remaining space for softmax computation, and computation data types.
The definition of the SoftMaxTiling structure is as follows. Developers do not need to pay attention to the specific information of this tiling structure. They only need to pass it to the kernel and directly use it through SoftMax high-level APIs.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
struct SoftMaxTiling { uint32_t srcM = 0; uint32_t srcK = 0; uint32_t srcSize = 0; uint32_t outMaxM = 0; uint32_t outMaxK = 0; uint32_t outMaxSize = 0; uint32_t splitM = 0; uint32_t splitK = 0; uint32_t splitSize = 0; uint32_t reduceM = 0; uint32_t reduceK = 0; uint32_t reduceSize = 0; uint32_t rangeM = 0; uint32_t tailM = 0; uint32_t tailSplitSize = 0; uint32_t tailReduceSize = 0; };
For details about SoftMax/SimpleSoftMax, see SoftMax/SimpleSoftMax Tiling.
For details about SoftmaxFlash, see SoftmaxFlash.
For details about SoftmaxGrad, see SoftmaxGrad Tiling.
For details about SoftmaxFlashV2, see SoftmaxFlashV2 Tiling.
For details about how to determine whether SoftMaxTiling is a basic block tiling, see IsBasicBlockInSoftMax.
Example
The following example describes the process of obtaining the tiling parameter on the host and the method of using the parameter in the kernel when SoftMax high-level APIs are used. In this example, the shape size of the input tensor is [320, 64], and the input data type is half.
- Add the SoftMaxTiling structure parameter to the TilingData structure to function as a field.
1 2 3 4 5 6
BEGIN_TILING_DATA_DEF(TilingData) // Register a tiling class and uses the tiling name as the input parameter. TILING_DATA_FIELD_DEF(uint32_t, totalLength); // Add the tiling field to compute the total data volume. TILING_DATA_FIELD_DEF(uint32_t, tileNum); // Add the tiling field that specifies the total number of data blocks to be computed on each core. ... // Add other tiling fields. TILING_DATA_FIELD_DEF_STRUCT(SoftMaxTiling, softmaxTilingData); // Add the SoftMaxTiling structure parameter to the TilingData structure. END_TILING_DATA_DEF;
- The tiling implementation function first calls the GetSoftMaxMaxTmpSize/GetSoftMaxMinTmpSize API to obtain the maximum and minimum temporary space sizes required by the SoftMax API to complete computation, sets an appropriate space size based on this range and the actual memory usage, and then obtains the tiling parameter required by the SoftMax kernel API based on the input shape and remaining size of computing space.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
namespace optiling { const uint32_t BLOCK_DIM = 8; const uint32_t TILE_NUM = 8; static ge::graphStatus TilingFunc(gert::TilingContext* context) { TilingData tiling; uint32_t totalLength = context->GetInputTensor(0)->GetShapeSize(); context->SetBlockDim(BLOCK_DIM); tiling.set_totalLength(totalLength); tiling.set_tileNum(TILE_NUM); // Set other tiling parameters. ... std::vector<int64_t> shapeVec = {320,64}; ge::Shape srcShape(shapeVec); // This example is only for reference. Use GetSoftMaxMinTmpSize to obtain the minimum value and pass it to ensure correct functionality. Developers can pass a proper space size as required. const uint32_t localWorkSpaceSize = AscendC::GetSoftMaxMinTmpSize(srcShape, sizeof(half), false); // Obtain SoftMax tiling parameters. AscendC::SoftMaxTilingFunc(srcShape, sizeof(half), localWorkSpaceSize, tiling.softmaxTilingData); ... // Other logic tiling.SaveToBuffer(context->GetRawTilingData()->GetData(), context->GetRawTilingData()->GetCapacity()); context->GetRawTilingData()->SetDataSize(tiling.GetDataSize()); context->SetTilingKey(1); return ge::GRAPH_SUCCESS; } } // namespace optiling
- The kernel calls GET_TILING_DATA in the kernel function to obtain TilingData, and then passes the SoftMax Tiling information in TilingData to the SoftMax API for computation. For details about the complete example in the kernel, see Example.
1 2 3 4 5 6 7 8 9
extern "C" __global__ __aicore__ void func_custom(GM_ADDR x, GM_ADDR y, GM_ADDR z, GM_ADDR workspace, GM_ADDR tiling) { GET_TILING_DATA(tilingData, tiling); KernelFunc op; op.Init(x, y, z, tilingData.totalLength, tilingData.tileNum,tilingData.SoftMaxTiling); if (TILING_KEY_IS(1)) { op.Process(); } }