DropOut

Function Usage

Filters SrcTensor (source operand, input tensor) based on MaskTensor to obtain DstTensor (destination operand, output tensor). The input shape must be in ND format.

The filtering function has two modes: byte mode and bit mode.

  • Byte mode

    The values stored in MaskTensor are of the Boolean type, with each Boolean value indicating whether or not the corresponding value in SrcTensor is used. If yes, the value in SrcTensor is selected and stored in DstTensor. Otherwise, the corresponding position in DstTensor is assigned a value of 0. The shapes of DstTensor, SrcTensor, and MaskTensor are the same. The following is an example:

    SrcTensor = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

    MaskTensor = [1, 0, 1, 0, 1, 0, 0, 1, 1, 0] (The data type of every element is uint8_t.)

    DstTensor = [1, 0, 3, 0, 5, 0, 0, 8, 9, 0]

  • Bit Mode

    Each bit value of MaskTensor indicates whether or not the corresponding value in SrcTensor is used. If yes, the value in SrcTensor is selected and stored in DstTensor. Otherwise, the corresponding position in DstTensor is assigned a value of 0. The shapes of SrcTensor and DstTensor are the same. Assume that their shape is [height, width], and then the shape of MaskTensor is [height, (width/8)]. The following is an example:

    SrcTensor = [1, 2, 3, 4, 5, 6, 7, 8]

    MaskTensor = [169] (1010 1001 in binary)

    DstTensor = [1, 0, 3, 0, 5, 0, 0, 8]

    • Special scenario 1: If the valid data of MaskTensor is not consecutively stored, the width axis of MaskTensor needs to be padded with invalid values to achieve 32-byte alignment, and the width axis of SrcTensor also needs to be 32-byte aligned. The following is an example:

      SrcTensor = [1, 2, 3, 4, 5, 6, 7, 8, 11, 12, 13, 14, 15, 16, 17, 18]

      MaskTensor = [1, 0, 1, 0, 1, 0, 0, 1, X, X, 1, 0, 1, 0, 1, 0, 0, 1, X, X] (X is an invalid value. Assume that the data meets the alignment requirements. The example values are in binary format.)

      DstTensor = [1, 0, 3, 0, 5, 0, 0, 8, 11, 0, 13, 0, 15, 0, 0, 18]

    • Special scenario 2: If the valid data of MaskTensor is consecutively stored and maskTensor_size is not 32-byte aligned, then pad the tail of MaskTensor to achieve 32-byte alignment. Similarly, the tail of SrcTensor should also be padded with invalid data to achieve 32-byte alignment of srcTensor_size. The following is an example:

      SrcTensor = [1, 2, 3, 4, 5, 6, 7, 8, 11, 12, 13, 14, 15, 16, 17, 18]

      MaskTensor = [1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, X, X, X, X] (X is an invalid value. Assume that the data meets the alignment requirements. The example values are in binary format.)

      DstTensor = [1, 0, 3, 0, 5, 0, 0, 8, 11, 0, 13, 0, 15, 0, 0, 18]

Principles

The figure below illustrates the internal algorithm block diagram of Dropout high-level APIs, taking the float type, ND format, SrcTensor with a shape of [srcM, srcN], and MaskTensor with a shape of [maskM, maskN] in bit mode as examples.

Figure 1 Dropout algorithm block diagram

The computation process is divided into the following steps, all of which are performed on vectors:

  1. GatherMask step: Clear the dirty data from the input MaskTensor so that only valid data is retained.
  2. Select step: Select data for SrcTensor based on the input MaskTensor. Retain the original data at the positions of the selected data and set the positions of the deprecated data to 0.
  3. Muls step: Divide each element of the output data by keepProb.

Prototype

1
2
template <typename T, bool isInitBitMode = false, uint32_t dropOutMode = 0>
__aicore__ inline void DropOut(const LocalTensor<T>& dstLocal, const LocalTensor<T>& srcLocal, const LocalTensor<uint8_t>& maskLocal, const float keepProb, const DropOutShapeInfo& info)
1
2
template <typename T, bool isInitBitMode = false, uint32_t dropOutMode = 0>
__aicore__ inline void DropOut(const LocalTensor<T>& dstLocal, const LocalTensor<T>& srcLocal, const LocalTensor<uint8_t>& maskLocal, const LocalTensor<uint8_t>& sharedTmpBuffer, const float keepProb, const DropOutShapeInfo& info)

Parameters

Table 1 Parameters in the template

Parameter

Description

T

Data type of the operand.

isInitBitMode

Whether to perform initialization inside the API in bit mode. The default value is false.

dropOutMode

Sets the input mode to be executed.

0 (default): The API infers the execution mode based on the input shape. Note that if the inference result does not meet the expectation, set a proper mode.

1: executes the byte mode, and maskLocal contains dirty data.

2: executes the byte mode, and maskLocal does not contain dirty data.

3: executes the bit mode, and maskLocal does not contain dirty data.

4: executes the bit mode, and maskLocal contains dirty data.

Table 2 API parameters

Parameter

Input/Output

Meaning

dstLocal

Output

Destination operand.

The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT.

srcLocal

Input

Source operand.

The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT.

The data type of srcLocal must be the same as that of the destination operand.

maskLocal

Input

Tensor for storing the mask.

The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT.

sharedTmpBuffer

Input

Shared buffer, which is used to store temporary data generated during internal API computation. It enables developers to manage the sharedTmpBuffer space and reuse the buffer after calling the API. The tensor size must meet the requirements of tiling. This parameter is used in conjunction with tiling.

The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT.

keepProb

Input

Weight coefficient, that is, the probability that data in srcLocal is retained. The filtered result will be divided by the weight coefficient and stored in dstLocal.

keepProb ∈ (0, 1)

info

Input

DropOutShapeInfo type. The DropOutShapeInfo structure is defined as follows:

1
2
3
4
5
6
struct DropOutShapeInfo {
__aicore__ DropOutShapeInfo(){};
uint32_t firstAxis = 0;   // number of elements on the height axis of srcLocal/maskTensor
uint32_t srcLastAxis = 0; // number of elements on the width axis of srcLocal
uint32_t maskLastAxis = 0;// number of elements on the width axis of maskTensor. (If there is data padding, this parameter indicates the length with dirty data. Note that the number of elements in all modes corresponds to the number of elements in the corresponding tensor type, for example, the number of uint8 elements corresponds to the number of elements in a uint8 tensor.)
};

Returns

None

Availability

Constraints

  • The tensor space of srcTensor and dstTensor can be reused.
  • For details about the address alignment requirements of srcLocal and dstLocal, see General Restrictions.
  • The input shape must be in ND format.
  • If maskLocal contains dirty data, the number of valid values in info.maskLastAxis must be an integer multiple of 2.
  • If maskLocal contains dirty data, data in maskLocal may be modified and dirty data may be deprecated.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
#include "kernel_operator.h"

template <typename srcType>
class KernelDropout {
public:
    __aicore__ inline KernelDropout()
    {}
    __aicore__ inline void Init(GM_ADDR srcGm, GM_ADDR maskGm, GM_ADDR dstGm, uint32_t firstAxis, uint32_t srcLastAxis,
        uint32_t maskLastAxis, uint32_t tmpBufferSize)
    {
        srcSize = firstAxis * srcLastAxis;
        maskSize = firstAxis * maskLastAxis;
        info.firstAxis = firstAxis;
        info.srcLastAxis = srcLastAxis;
        info.maskLastAxis = maskLastAxis;

        srcGlobal.SetGlobalBuffer(reinterpret_cast<__gm__ srcType *>(srcGm), srcSize);
        maskGlobal.SetGlobalBuffer(reinterpret_cast<__gm__ uint8_t *>(maskGm), maskSize);
        dstGlobal.SetGlobalBuffer(reinterpret_cast<__gm__ srcType *>(dstGm), srcSize);

        pipe.InitBuffer(inQueueX, 1, srcSize * sizeof(srcType));
        pipe.InitBuffer(inQueueY, 1, maskSize * sizeof(uint8_t));
        pipe.InitBuffer(outQueue, 1, srcSize * sizeof(srcType));
        pipe.InitBuffer(tmpQueue, 1, tmpBufferSize);
    }
    __aicore__ inline void Process()
    {
        CopyIn();
        Compute();
        CopyOut();
    }

private:
    __aicore__ inline void CopyIn()
    {
        AscendC::LocalTensor<srcType> srcLocal = inQueueX.AllocTensor<srcType>();
        AscendC::LocalTensor<uint8_t> maskLocal = inQueueY.AllocTensor<uint8_t>();
        AscendC::DataCopy(srcLocal, srcGlobal, srcSize);
        AscendC::DataCopy(maskLocal, maskGlobal, maskSize);
        inQueueX.EnQue(srcLocal);
        inQueueY.EnQue(maskLocal);
    }
    __aicore__ inline void Compute()
    {
        AscendC::LocalTensor<srcType> dstLocal = outQueue.AllocTensor<srcType>();
        AscendC::LocalTensor<uint8_t> sharedTmpBuffer = tmpQueue.AllocTensor<uint8_t>();

        AscendC::LocalTensor<srcType> srcLocal = inQueueX.DeQue<srcType>();
        AscendC::LocalTensor<uint8_t> maskLocal = inQueueY.DeQue<uint8_t>();

        AscendC::DropOut(dstLocal, srcLocal, maskLocal, sharedTmpBuffer, probValue, info);

        outQueue.EnQue<srcType>(dstLocal);

        inQueueX.FreeTensor(srcLocal);
        inQueueY.FreeTensor(maskLocal);
        tmpQueue.FreeTensor(sharedTmpBuffer);
    }
    __aicore__ inline void CopyOut()
    {
        AscendC::LocalTensor<srcType> dstLocal = outQueue.DeQue<srcType>();
        AscendC::DataCopy(dstGlobal, dstLocal, srcSize);
        outQueue.FreeTensor(dstLocal);
    }

private:
    AscendC::GlobalTensor<srcType> srcGlobal;
    AscendC::GlobalTensor<uint8_t> maskGlobal;
    AscendC::GlobalTensor<srcType> dstGlobal;

    AscendC::TPipe pipe;
    AscendC::TQue<AscendC::QuePosition::VECIN, 1> inQueueX;
    AscendC::TQue<AscendC::QuePosition::VECIN, 1> inQueueY;
    AscendC::TQue<AscendC::QuePosition::VECOUT, 1> outQueue;
    AscendC::TQue<AscendC::QuePosition::VECCALC, 1> tmpQueue;

    uint32_t srcSize = 0;
    uint32_t maskSize = 0;
    float probValue = 0.8;
    AscendC::DropOutShapeInfo info;
};

extern "C" __global__ __aicore__ void kernel_dropout_operator(
    GM_ADDR srcGm, GM_ADDR maskGm, GM_ADDR dstGm, GM_ADDR tiling)
{
    GET_TILING_DATA(tilingData, tiling);
    KernelDropout<half> op;
    op.Init(srcGm,
        maskGm,
        dstGm,
        tilingData.firstAxis,
        tilingData.srcLastAxis,
        tilingData.maskLastAxis,
        tilingData.tmpBufferSize);
    op.Process();
}