TopK Tiling
Function Description
Obtains the TopK tiling parameter.
Ascend C provides the TopK Tiling API for users to obtain the tiling parameter required for TopK kernel computation. Before reading this section, refer to Tiling Implementation to learn the basic tiling process.
To obtain the tiling parameter, perform the following two steps:
- Obtain the minimum and maximum temporary space sizes required for TopK API computation. Note that this step is not mandatory and only serves as a reference for appropriately allocating computing space.
- Obtain the tiling parameter required by the TopK kernel API.
The definition of the TopKTiling structure is as follows. You do not need to pay attention to the specific information of this tiling structure, but only need to pass it to the kernel and directly use it through TopK high-level APIs.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
struct TopkTiling { int32_t tmpLocalSize = 0; int32_t allDataSize = 0; int32_t innerDataSize = 0; uint32_t sortRepeat = 0; int32_t mrgSortRepeat = 0; int32_t kAlignFourBytes = 0; int32_t kAlignTwoBytes = 0; int32_t maskOffset = 0; int32_t maskVreducev2FourBytes = 0; int32_t maskVreducev2TwoBytes = 0; int32_t mrgSortSrc1offset = 0; int32_t mrgSortSrc2offset = 0; int32_t mrgSortSrc3offset = 0; int32_t mrgSortTwoQueueSrc1Offset = 0; int32_t mrgFourQueueTailPara1 = 0; int32_t mrgFourQueueTailPara2 = 0; int32_t srcIndexOffset = 0; int32_t topkNSmallTmpLocalSize = 0; int32_t topkMrgSrc1MaskSizeOffset = 0; int32_t topkNSmallSrcIndexOffset = 0; uint32_t vreduceValMask0 = 0; uint32_t vreduceValMask1 = 0; uint32_t vreduceIdxMask0 = 0; uint32_t vreduceIdxMask1 = 0; uint16_t vreducehalfValMask0 = 0; uint16_t vreducehalfValMask1 = 0; uint16_t vreducehalfValMask2 = 0; uint16_t vreducehalfValMask3 = 0; uint16_t vreducehalfValMask4 = 0; uint16_t vreducehalfValMask5 = 0; uint16_t vreducehalfValMask6 = 0; uint16_t vreducehalfValMask7 = 0; uint16_t copyUbToUbBlockCount = 0; };
Prototype
1
|
bool GetTopKMaxMinTmpSize(const platform_ascendc::PlatformAscendC &ascendcPlatform, const int32_t inner, const int32_t outter, const bool isReuseSource, const bool isInitIndex, enum TopKMode mode, const bool isLargest, const uint32_t dataTypeSize, uint32_t &maxValue, uint32_t &minValue) |
1
|
bool TopKTilingFunc(const platform_ascendc::PlatformAscendC &ascendcPlatform, const int32_t inner, const int32_t outter, const int32_t k, const uint32_t dataTypeSize, const bool isInitIndex, enum TopKMode mode, const bool isLargest, optiling::TopkTiling &topKTiling) |
Parameters
|
API |
Input/Output |
Function |
|---|---|---|
|
ascendcPlatform |
Input |
Hardware platform information that is passed. For details about the definition of PlatformAscendC, see Overview. |
|
inner |
Input |
Inner axis length of srcLocal. |
|
outter |
Input |
Outer axis length of srcLocal. |
|
isReuseSource |
Input |
Whether intermediate variables can reuse the input buffer. The value must be the same as the value of isReuseSrc in the kernel. |
|
isInitIndex |
Input |
Whether to pass the indexes corresponding to input data. The value must be consistent with that of kernel APIs. |
|
mode |
Input |
Selection between TopKMode::TOPK_NORMAL or TopKMode::TOPK_NSMALL. The value must be consistent with that of kernel APIs. |
|
isLargest |
Input |
Descending order (true) or ascending order (false). The value must be consistent with that of kernel APIs. |
|
dataTypeSize |
Input |
Size of the srcLocal data type involved in computation, for example, half=2 and float=4. |
|
maxValue |
Output |
Maximum temporary space size (in byte) required to complete internal computation of the TopK API.
NOTE:
maxValue is for reference only and may be larger than the remaining space of the Unified Buffer. In this case, select a proper temporary space size based on the remaining space of the Unified Buffer. |
|
minValue |
Output |
Minimum temporary space size (in byte) required to complete internal computation of the TopK API. |
|
API |
Input/Output |
Function |
|---|---|---|
|
ascendcPlatform |
Input |
Hardware platform information that is passed. For details about the definition of PlatformAscendC, see Overview. |
|
inner |
Input |
Inner axis length of srcLocal. |
|
outter |
Input |
Outer axis length of srcLocal. |
|
k |
Input |
Obtains the first k maximum or minimum values and their corresponding indexes. |
|
dataTypeSize |
Input |
Size of the srcLocal data type involved in computation, for example, half=2 and float=4. |
|
isInitIndex |
Input |
Whether to pass the indexes corresponding to input data. The value must be consistent with that of kernel APIs. |
|
mode |
Input |
Selection between TopKMode::TOPK_NORMAL or TopKMode::TOPK_NSMALL. The value must be consistent with that of kernel APIs. |
|
isLargest |
Input |
Descending order (true) or ascending order (false). The value must be consistent with that of kernel APIs. |
|
topKTiling |
Output |
Tiling information required by the TopK API. |
Returns
The return value of GetTopKMaxMinTmpSize is either true or false. true indicates that the maximum and minimum temporary space sizes required for TopK internal computation are successfully obtained. false indicates that the sizes fail to be obtained.
The return value of TopKTilingFunc is either true or false. true indicates that the tiling parameter values of TopK are successfully obtained. false indicates that the values fail to be obtained.
Example
The following example describes the process of obtaining the tiling parameter on the host and the method of using the parameter in the kernel when TopK high-level APIs are used.
- Add the TopKTiling structure parameter to the TilingData structure to function as a field.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
namespace optiling { BEGIN_TILING_DATA_DEF(TilingData) TILING_DATA_FIELD_DEF(uint32_t, totalLength); TILING_DATA_FIELD_DEF(uint32_t, tilenum); // Add other tiling fields. ... TILING_DATA_FIELD_DEF(int32_t, k); TILING_DATA_FIELD_DEF(bool, islargest); TILING_DATA_FIELD_DEF(bool, isinitindex); TILING_DATA_FIELD_DEF(bool, ishasfinish); TILING_DATA_FIELD_DEF(uint32_t, tmpsize); TILING_DATA_FIELD_DEF(int32_t, outter); TILING_DATA_FIELD_DEF(int32_t, inner); TILING_DATA_FIELD_DEF(int32_t, n); TILING_DATA_FIELD_DEF(int32_t, order); TILING_DATA_FIELD_DEF(int32_t, sorted); TILING_DATA_FIELD_DEF_STRUCT(TopkTiling, topkTilingData); END_TILING_DATA_DEF; REGISTER_TILING_DATA_CLASS(TopkCustom, TilingData) }
- The tiling implementation function first calls the GetTopKMaxMinTmpSize API to obtain the maximum and minimum temporary space sizes required by the TopK API to complete computation, sets an appropriate space size based on this range and the actual buffer usage, and then obtains the tiling parameter required by the TopK kernel API based on the input shape and other information.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41
namespace optiling { const uint32_t BLOCK_DIM = 8; const uint32_t TILE_NUM = 8; const int32_t OUTTER = 2; const int32_t INNER = 32; const int32_t N = 32; const int32_t K = 8; const bool IS_LARGEST = true; const bool IS_INITINDEX = true; const bool IS_REUSESOURCE = false; static ge::graphStatus TilingFunc(gert::TilingContext* context) { TilingData tiling; uint32_t totalLength = context->GetInputTensor(0)->GetShapeSize(); context->SetBlockDim(BLOCK_DIM); tiling.set_totalLength(totalLength); tiling.set_tileNum(TILE_NUM); tiling.set_k(K); tiling.set_outter(OUTTER); tiling.set_inner(INNER); tiling.set_n(N); tiling.set_islargest(IS_LARGEST); tiling.set_isinitindex(IS_INITINDEX); // Set other tiling parameters. ... // This example is for reference only. Use GetTopKMaxMinTmpSize to obtain the minimum value and pass it to ensure correct functionality. Developers can pass a proper space size as required. uint32_t maxsize = 0; uint32_t minsize = 0; uint32_t dtypesize = 4; // float type auto ascendcPlatform = platform_ascendc::PlatformAscendC(context->GetPlatformInfo()); AscendC::TopKTilingFunc(ascendcPlatform, tiling.inner, tiling.outter, tiling.k, dtypesize, tiling.isinitindex, AscendC::TopKMode::TOPK_NSMALL, tiling.islargest, tiling.topkTilingData); AscendC::GetTopKMaxMinTmpSize(ascendcPlatform, tiling.inner, tiling.outter, IS_REUSESOURCE, tiling.isinitindex, AscendC::TopKMode::TOPK_NSMALL, tiling.islargest, dtypesize, maxsize, minsize); tiling.set_tmpsize(minsize); ... // Other logic tiling.SaveToBuffer(context->GetRawTilingData()->GetData(), context->GetRawTilingData()->GetCapacity()); context->GetRawTilingData()->SetDataSize(tiling.GetDataSize()); size_t *currentWorkspace = context->GetWorkspaceSizes(1); currentWorkspace[0] = 0; return ge::GRAPH_SUCCESS; } } // namespace optiling
- The kernel calls GET_TILING_DATA in the kernel function to obtain TilingData, and then passes the TopK Tiling information in TilingData to the TopK API for computation. For details about the complete example in the kernel, see Example.
1 2 3 4 5 6 7
extern "C" __global__ __aicore__ void topk_custom(GM_ADDR srcVal, GM_ADDR srcIdx, GM_ADDR finishLocal, GM_ADDR dstVal, GM_ADDR dstIdx, GM_ADDR tiling) { { GET_TILING_DATA(tilingData, tiling); KernelTopK<float, true, true, false, false, AscendC::TopKMode::TOPK_NSMALL> op; op.Init(srcVal, srcIdx, finishLocal, dstVal, dstIdx, tilingData.k, tilingData.islargest, tilingData.tmpsize, tilingData.outter, tilingData.inner, tilingData.n,tilingData.topkTilingData); op.Process(); }