SimpleSoftMax

Applicability

Product	Supported
Atlas A3 training products / Atlas A3 inference products	√
Atlas A2 training products / Atlas A2 inference products	√
Atlas 200I/500 A2 inference products	√
Atlas inference product 's AI Core	√
Atlas inference product 's Vector Core	x
Atlas training products	x

Function

If the product of non-last axis lengths of the input tensor [m₀, m₁, ..., m_t, n] (t ≥ 0) is considered as m, the shape of the input tensor is [m, n]. Perform the following computation on the input tensor [m, n] by row. Unlike the API in SoftMax, this API does not include the reduction process to compute the sum and max data. Instead, it uses the computed sum and max data to perform SoftMax computation on the input tensor. Below is the formula.

$\text{[math]}$

For ease of understanding, the formula expressed through a Python script is as follows, where src, max, and sum are the source operands (input), and dst is the destination operand (output).

      
           def simple_softmax(src, max, sum):
    dst = np.exp(src - max)/sum
    return dst

Principles

The following figure shows the internal algorithm diagram of the SimpleSoftMax high-level APIs by taking the input tensor of the float type, in ND format, and with shape [m, k] as an example.

Figure 1 Diagram of the SimpleSoftMax algorithm

The computation process is divided into the following steps, all of which are performed on vectors:

1. sub: Subtract max from all data of input x by row.

2. exp: Compute exp for all data after sub.

3. div: Divide all data generated after exp by sum at each row to obtain the final result.

Prototype

Allocate the temporary space through the API framework.

The data types of LocalTensor are the same.

          
               template <typename T, bool isReuseSource = false, bool isBasicBlock = false, bool isDataFormatNZ = false, const SoftmaxConfig& config = SOFTMAX_DEFAULT_CFG> 
__aicore__ inline void SimpleSoftMax(const LocalTensor<T>& dstTensor, const LocalTensor<T>& inSumTensor, const LocalTensor<T>& inMaxTensor, const LocalTensor<T>& srcTensor, const SoftMaxTiling& tiling, const SoftMaxShapeInfo& softmaxShapeInfo = {})

The data types of LocalTensor are different.

          
               template <typename T, bool isReuseSource = false, bool isBasicBlock = false, bool isDataFormatNZ = false, const SoftmaxConfig& config = SOFTMAX_DEFAULT_CFG>
__aicore__ inline void SimpleSoftMax(const LocalTensor<half>& dstTensor, const LocalTensor<float>& inSumTensor, const LocalTensor<float>& inMaxTensor, const LocalTensor<half>& srcTensor, const SoftMaxTiling& tiling, const SoftMaxShapeInfo& softmaxShapeInfo = {})

Pass to the temporary space through the sharedTmpBuffer input parameter.

The data types of LocalTensor are the same.

          
               template <typename T, bool isReuseSource = false, bool isBasicBlock = false, bool isDataFormatNZ = false, const SoftmaxConfig& config = SOFTMAX_DEFAULT_CFG> 
__aicore__ inline void SimpleSoftMax(const LocalTensor<T>& dstTensor, const LocalTensor<T>& inSumTensor, const LocalTensor<T>& inMaxTensor, const LocalTensor<T>& srcTensor, const LocalTensor<uint8_t>& sharedTmpBuffer, const SoftMaxTiling& tiling, const SoftMaxShapeInfo& softmaxShapeInfo = {})

The data types of LocalTensor are different.

          
               template <typename T, bool isReuseSource = false, bool isBasicBlock = false, bool isDataFormatNZ = false, const SoftmaxConfig& config = SOFTMAX_DEFAULT_CFG>
__aicore__ inline void SimpleSoftMax(const LocalTensor<half>& dstTensor, const LocalTensor<float>& inSumTensor, const LocalTensor<float>& inMaxTensor, const LocalTensor<half>& srcTensor, const LocalTensor<uint8_t>& sharedTmpBuffer, const SoftMaxTiling& tiling, const SoftMaxShapeInfo& softmaxShapeInfo = {})

Due to the complex computation involved in the internal implementation of this API, extra temporary space is required to store intermediate variables generated during computation. The temporary space can be allocated through the API framework or passed by developers through the sharedTmpBuffer input parameter.

When the API framework is used for temporary space allocation, developers do not need to allocate the space, but must reserve the required size for the space.

When the sharedTmpBuffer input parameter is used for passing the temporary space, the tensor serves as the temporary space. In this case, the API framework is not required for temporary space allocation. This enables developers to manage the sharedTmpBuffer space and reuse the buffer after calling the API, so that the buffer is not repeatedly allocated and deallocated, improving the flexibility and buffer utilization.

If the API framework is used, developers must reserve the temporary space. If sharedTmpBuffer is used, developers must allocate space for the tensor. The method of obtaining the temporary space size (BufferSize) is as follows: Obtain the required maximum and minimum temporary space sizes using the GetSoftMaxMaxTmpSize/GetSoftMaxMinTmpSize API provided in SoftMax/SimpleSoftMax Tiling. The minimum space can ensure correct functionality, while the maximum space is used to improve performance.

Parameters

Table 1 Template parameters

Parameter

Description

Data type of the operand.

For the Atlas A3 training products / Atlas A3 inference products , the supported data types are half and float.

For the Atlas A2 training products / Atlas A2 inference products , the supported data types are half and float.

For the Atlas inference product 's AI Core, the supported data types are half and float.

For the Atlas 200I/500 A2 inference products , the supported data types are half and float.

isReuseSource

This parameter is reserved. Pass the default value false.

isBasicBlock

If the shape information and tiling strategy of both srcTensor and dstTensor meet the base block requirements, this parameter can be enabled to improve performance. By default, this parameter is disabled. Use either of the following methods to determine whether the base block requirements are met:

The shape information [m, n] of srcTensor and dstTensor must meet the following requirements:
- The last axis length n is less than 2048 and greater than or equal to 256/sizeof(T). That is, the minimum value of n is 128 when the data type is half and 64 when the data type is float. In addition, n is a multiple of 64.
- The product m of non-last axis lengths is a multiple of 8.

You can call IsBasicBlockInSoftMax to check whether the tiling strategy meets the tiling requirements of base blocks.

For the Atlas 200/500 A2 Inference Product, this parameter is reserved for future function extension. Retain the default value.

isDataFormatNZ

Whether the current input and output data is in NZ format. The default data format is ND, that is, the default value of this parameter is false.

For the Atlas 200/500 A2 Inference Product, the NZ format is not supported.

config

(Optional) structure template parameter, which is of the SoftmaxConfig type. The definition is as follows:

           
                struct SoftmaxConfig{
bool isCheckTiling = true; // Whether to check the consistency between the shape and tiling. If they are inconsistent, the API re-computes the required tiling based on the shape. The default value is true, indicating that the API checks the consistency internally.
uint32_t oriSrcM = 0; // Product of the original non-last axis lengths. After this parameter is set, the shape is turned into a constant value, and the constant shape is used at compile time.
uint32_t oriSrcK = 0; // Original last axis length. After this parameter is set, the shape is turned into a constant value, and the constant shape is used at compile time.
};

A configuration example is as follows:

           
                constexpr SoftmaxConfig SOFTMAX_DEFAULT_CFG = {true, 0, 0};

This parameter is used together with the tiling computation API in the kernel.

Note: The priority of the config parameter is lower than that of the template parameter isBasicBlock. If isBasicBlock is enabled, the API splits base blocks for optimization, and the constant shape of the config parameter does not take effect.

For the Atlas A3 training products / Atlas A3 inference products , this parameter is supported.

For the Atlas A2 training products / Atlas A2 inference products , this parameter is supported.

For the Atlas inference product 's AI Core, this parameter is supported.

For the Atlas 200I/500 A2 inference products , this parameter is reserved for future use. Retain the default value.

Table 2 API parameters

Parameter

Input/Output

Description

dstTensor

Output

Destination operand.