SoftmaxFlashV2 Tiling

Function Usage

Obtains the tiling parameters required by SoftmaxFlashV2.

Prototype

1
uint32_t GetSoftMaxFlashV2MinTmpSize(const ge::Shape& srcShape, const uint32_t dataTypeSize1, const uint32_t dataTypeSize2, const bool isUpdate, const bool isBasicBlock = false, const bool isFlashOutputBrc = false)
1
uint32_t GetSoftMaxFlashV2MaxTmpSize(const ge::Shape& srcShape, const uint32_t dataTypeSize1, const uint32_t dataTypeSize2, const bool isUpdate, const bool isBasicBlock = false, const bool isFlashOutputBrc = false)
1
void SoftMaxFlashV2TilingFunc(const ge::Shape& srcShape, const uint32_t dataTypeSize1, const uint32_t dataTypeSize2, const uint32_t localWorkSpaceSize, optiling::SoftMaxTiling& softmaxFlashTiling, const bool isUpdate, const bool isBasicBlock = false, const bool isFlashOutputBrc = false)

Parameters

Table 1 GetSoftMaxFlashV2MinTmpSize/GetSoftMaxFlashV2MaxTmpSize API parameters

API

Input/Output

Description

srcShape

Input

Shape of the input srcTensor.

dataTypeSize1

Input

Data type size of the source data to be computed, for example, half = 2.

dataTypeSize2

Input

Data type size of maxTensor and sumTensor involved in computation, for example, half = 2.

isUpdate

Input

Whether to enable the refresh function. The value must be consistent with that of the SoftmaxFlashV2 API in the kernel.

isBasicBlock

Input

Whether to enable base block computation. The isBasicBlock parameter can be obtained through the isBasicBlockInSoftmax API. The value must be the same as the template parameter of the API on the kernel. The default value is false. Note that if the template parameter SoftmaxConfig is enabled by the API on the kernel, that is, a constant shape is used, the isBasicBlock parameter must be obtained through the isBasicBlockInSoftmax API.

isFlashOutputBrc

Input

Whether to enable the non-extended mode of the output shape. In non-extended mode, BroadCast is not performed on the output data, and the output shape is (m, 1). Values:

  • false (default): disables the non-extended mode. When the output data type is float, the shape is (m, 8). When the output data type is half, the shape is (m, 16).
  • true: enables the non-extended mode. The output shape is (m, 1). If this parameter is set to true, mode in SoftmaxConfig of kernel API must be set to SoftmaxMode::SOFTMAX_OUTPUT_WITHOUT_BRC.
Table 2 SoftMaxFlashV2TilingFunc API parameters

API

Input/Output

Description

srcShape

Input

Shape of the input srcTensor.

localWorkSpaceSize

Input

Size of the remaining space that can be used for SoftmaxFlashV2 computation. Note that the value of localWorkSpaceSize must be greater than the minimum temporary space size as required by the return of the GetSoftMaxFlashV2MinTmpSize API.

dataTypeSize1

Input

Data type size of the source data to be computed, for example, half = 2.

dataTypeSize2

Input

Data type size of maxTensor and sumTensor involved in computation, for example, half = 2.

isUpdate

Input

Whether to enable the refresh function. The value must be consistent with that of the SoftmaxFlashV2 API in the kernel.

isBasicBlock

Input

Whether to enable base block computation. The isBasicBlock parameter can be obtained through the isBasicBlockInSoftmax API. The value must be the same as the template parameter of the API on the kernel. The default value is false. Note that if the template parameter SoftmaxConfig is enabled by the API on the kernel, that is, a constant shape is used, the isBasicBlock parameter must be obtained through the isBasicBlockInSoftmax API.

isFlashOutputBrc

Input

Whether to enable the non-extended mode of the output shape. In non-extended mode, BroadCast is not performed on the output data, and the output shape is (m, 1). Values:

  • false (default): disables the non-extended mode. When the output data type is float, the shape is (m, 8). When the output data type is half, the shape is (m, 16).
  • true: enables the non-extended mode. The output shape is (m, 1). If this parameter is set to true, mode in SoftmaxConfig of kernel API must be set to SoftmaxMode::SOFTMAX_OUTPUT_WITHOUT_BRC.

softmaxFlashTiling

Output

Tiling information required by SoftmaxFlashV2.

Returns

GetSoftMaxFlashV2MinTmpSize returns the minimum size (in byte) of the temporary space required for SoftmaxFlashV2 computation.

GetSoftMaxFlashV2MaxTmpSize returns the maximum size (in byte) of the temporary space required for SoftmaxFlashV2 computation.