AdjustSoftMaxRes

Function Usage

Adjusts the computation result of SoftMax to a specified value. It is mainly used to postprocess SoftMax computation results. If a specified value exists in the input max, the result in the corresponding softmaxres is adjusted to a user-defined value. The preceding adjustment is performed by row. That is, when the value of max in a row is specified, the value of softmaxres in the corresponding row is adjusted to a user-defined value.

For ease of understanding, the formula expressed through a Python script is as follows, where res is both input and output, and max, from, to, and res_shape are inputs.

1
2
3
4
5
6
def adjust_softmax_res(res, max, from, to, res_shape):
    for i in res_shape[0]:
        if max[i] == from:
            for j in res_shape[1]:
                res[i][j] = to
    return

Prototype

1
2
template <typename T1, typename T2, bool isDataFormatNZ = false, uint8_t stepSizeMode = 0>
__aicore__ inline bool AdjustSoftMaxRes(const LocalTensor<T1>& softMaxRes, const LocalTensor<T2>& maxTensor, const uint32_t from, const T1 to, const SoftMaxShapeInfo& softmaxShapeInfo)

Parameters

Table 1 Parameters in the template

Parameter

Description

T1

Data type of softMaxRes.

T2

Data type of maxTensor.

isDataFormatNZ

Whether the current input and output data is in NZ format. The default data format is ND, that is, the default value of this parameter is false.

stepSizeMode

Adjusts and outputs the first value in each block size (32 bytes) from the input tensor. The default value is 0. When the data type is float, this parameter outputs the first value from every eight digits based on the input shape (m, 8). When the data type is half, this parameter outputs the first value from every 16 digits based on the input shape (m, 16). When this parameter is set to a non-zero value, the set value is used as the step length of the element number to adjust and output the first value from the input tensor to the output tensor. This parameter supports only the ND format.

Table 2 API parameters

Parameter

Input/Output

Description

softMaxRes

Input/Output

Source operand and destination operand.

The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT.

For details about the definition of the LocalTensor structure, see LocalTensor.

The length of the last axis must be 32-byte aligned.

Its value is the output result of softmax computation.

maxTensor

Input

Source operand.

The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT.

Result of reducemax during softmax computation.

  • The length of the last axis of maxTensor is fixed at 32 bytes, that is, the length of a data block. All data in this data block has the same value. For example, in the half data type, all 16 numbers in this data block possess an identical reducemax value.
  • The length of each non-last axis is the same as that of softMaxRes.

from

Input

Source operand of the uint32_t type.

Value of maxTensor to be checked. Note that the values of maxTensor are floating point numbers. Therefore, you need to input a hexadecimal value corresponding to a floating point number. For example, if you want to check whether the value of maxTensor is 1.0, set from to the hexadecimal value 0x3f800000 corresponding to 1.0.

to

Input

Source operand. The data type is the same as that of softMaxRes.

Value to be padded in softMaxRes.

softmaxShapeInfo

Input

Shape information of softMaxRes. The structure is defined as follows:

1
2
3
4
5
6
struct SoftMaxShapeInfo {
uint32_t srcM; // Product of lengths of non-last axes.
uint32_t srcK; // Length of the last axis, which must be 32-byte aligned.
uint32_t oriSrcM; // Product of lengths of original non-last axes.
uint32_t oriSrcK; // Length of the original last axis.
};

Currently, only ND input is supported.

Returns

A Boolean value is returned. true indicates that there is a value to be checked exists in maxTensor. false indicates that there is no value to be checked exists in maxTensor.

Availability

Precautions

  • For details about the alignment requirements of the operand address offset, see General Restrictions.

Example

In this example, postprocessing is performed on the SoftMax computation result to check whether 0xFF7FFFFF exists in maxTensor. If 0xFF7FFFFF exists, the result is updated to 0. The shape size of the input softMaxRes is [320, 64], the shape size of the intermediate computation result maxTensor is [320, 8], and the data type is float. Postprocessing needs to be performed on the softMax computation result to check whether 0xFF7FFFFF exists in maxTensor. If yes, the result is updated to 0.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
#include "kernel_operator.h"

template <typename T>
class KernelSoftmax {
public:
    __aicore__ inline KernelSoftmax()
    {}
    __aicore__ inline void Init(__gm__ uint8_t *srcGm, __gm__ uint8_t *dstGm, const SoftMaxTiling &tilingData)
    {
        elementNumPerBlk = 32 / sizeof(T);
        src1Global.SetGlobalBuffer((__gm__ T *)srcGm);
        dstGlobal.SetGlobalBuffer((__gm__ T *)dstGm);
        pipe.InitBuffer(inQueueSrc, 1, height * width * sizeof(T));
        pipe.InitBuffer(maxQueue, 1, height * elementNumPerBlk * sizeof(T));
        pipe.InitBuffer(sumQueue, 1, height * elementNumPerBlk * sizeof(T));
        pipe.InitBuffer(outQueueDst, 1, height * width * sizeof(T));
        tiling = tilingData;
    }
    __aicore__ inline void Process()
    {
        CopyIn();
        Compute();
        CopyOut();
    }

private:
    __aicore__ inline void CopyIn()
    {
        AscendC::LocalTensor<T> srcLocal = inQueueSrc.AllocTensor<T>();
        AscendC::DataCopy(srcLocal, src1Global, height * width);
        inQueueSrc.EnQue(srcLocal);
    }
    __aicore__ inline void Compute()
    {
        AscendC::LocalTensor<T> srcLocal = inQueueSrc.DeQue<T>();
        AscendC::LocalTensor<T> sumTempLocal = sumQueue.AllocTensor<T>();
        AscendC::LocalTensor<T> maxTempLocal = maxQueue.AllocTensor<T>();
        AscendC::LocalTensor<T> dstLocal = outQueueDst.AllocTensor<T>();
        AscendC::SoftMaxShapeInfo srcShape = {height, width, height, width};
        AscendC::SoftMax<T>(dstLocal, sumTempLocal, maxTempLocal, srcLocal, tiling, srcShape);
        AscendC::AdjustSoftMaxRes<T, T>(dstLocal, maxTempLocal, 0xFF7FFFFF, 0.0, srcShape);
        outQueueDst.EnQue<T>(dstLocal);
        maxQueue.FreeTensor(maxTempLocal);
        sumQueue.FreeTensor(sumTempLocal);
        inQueueSrc.FreeTensor(srcLocal);
    }
    __aicore__ inline void CopyOut()
    {
        AscendC::LocalTensor<T> dstLocal = outQueueDst.DeQue<T>();
        AscendC::DataCopy(dstGlobal, dstLocal, height * width);
        outQueueDst.FreeTensor(dstLocal);
    }

private:
    AscendC::TPipe pipe;
    AscendC::TQue<AscendC::QuePosition::VECIN, 1> inQueueSrc;
    AscendC::TQue<AscendC::QuePosition::VECIN, 1> maxQueue;
    AscendC::TQue<AscendC::QuePosition::VECIN, 1> sumQueue;
    AscendC::TQue<AscendC::QuePosition::VECOUT, 1> outQueueDst;
    AscendC::GlobalTensor<T> src1Global, dstGlobal;
    uint32_t elementNumPerBlk = 0;
    uint32_t width = 64;
    uint32_t height = 320;
    SoftMaxTiling tiling;
};

extern "C" __global__ __aicore__ void softmax_kernel_float(
    __gm__ uint8_t *srcGm, __gm__ uint8_t *dstGm, __gm__ uint8_t *tiling)
{
    GET_TILING_DATA(tilingData, tiling);
    KernelSoftmax<float> op;
    op.Init(srcGm, dstGm, tilingData.softmaxTilingData);
    op.Process();
}