GroupNorm

Function Usage

A general formula for normalizing a feature is as follows:

In this formula, indicates the index of the feature, and indicate the values of the feature before and after normalization respectively. and indicate the mean value and standard deviation of the feature, and their formulas are as follows:

is a small constant, indicates the set of data involved in the calculation, and indicates the set size. The main difference between different types of feature standardization methods (such as BatchNorm, LayerNorm, InstanceNorm, and GroupNorm) lies in the selection of data sets involved in calculation. The following describes how to select data sets for different Norm operators for computation.

For an input with the shape of [N, C, H, W], GroupNorm divides each [C, H, W] into groupNum groups in the C dimension and then normalizes each group. Finally, the standardized feature is scaled and translated. The scaling parameter and the translation parameter are trainable.

Prototype

  • Allocate the temporary space through the API framework.
    1
    2
    template <typename T, bool isReuseSource = false>
    __aicore__ inline void GroupNorm(const LocalTensor<T>& output, const LocalTensor<T>& outputMean, const LocalTensor<T>& outputVariance, const LocalTensor<T>& inputX, const LocalTensor<T>& gamma, const LocalTensor<T>& beta, const T epsilon, GroupNormTiling& tiling)
    
  • Pass the temporary space through the sharedTmpBuffer input parameter.
    1
    2
    template <typename T, bool isReuseSource = false>
    __aicore__ inline void GroupNorm(const LocalTensor<T>& output, const LocalTensor<T>& outputMean, const LocalTensor<T>& outputVariance, const LocalTensor<T>& inputX, const LocalTensor<T>& gamma, const LocalTensor<T>& beta, const LocalTensor<uint8_t>& sharedTmpBuffer, const T epsilon, GroupNormTiling& tiling)
    

Parameters

Table 1 Parameters in the template

Parameter

Description

T

Data type of the operand.

isReuseSource

Whether the source operand can be modified. The default value is false. If you are allowed to modify the source operand, enable this parameter, to save some memory space.

If this parameter is set to true, the inputX memory space is reused during internal computation of this API to save the memory space. If this parameter is set to false, the inputX memory space is not reused during internal computation of this API.

This parameter can be enabled for float input data but cannot be enabled for half input data.

For details about how to use isReuseSource, see More Examples.

Table 2 API parameters

Parameter

Input/Output

Description

output

Output

Destination operand, which is the result of scaling and translation calculation on the standardized input. The shape is [N, C, H, W].

The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT.

outputMean

Output

Destination operand, which indicates the mean value. The shape is [N, groupNum].

The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT.

outputVariance

Output

Destination operand, which indicates the variance. The shape is [N, groupNum].

The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT.

inputX

Input

Source operand. The shape is [N, C, H, W].

The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT.

gamm

Input

Source operand, which indicates the scaling parameter. The shape is [C].

The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT.

beta

Input

Source operand, which indicates the translation parameter. The shape is [C].

The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT.

sharedTmpBuffer

Input

This parameter is used to store intermediate variables during complex internal API computation and is provided by developers.

The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT.

For details about how to obtain the temporary space size (BufferSize), see GroupNorm Tiling.

epsilon

Input

Weight coefficient for preventing division by zero. The data type must be the same as that of inputX or output.

tilling

Input

Tiling information of input data. For details about how to obtain the tiling information, see GroupNorm Tiling.

Returns

None

Availability

Precautions

  • For details about the alignment requirements of the operand address offset, see General Restrictions.
  • Currently, only the ND format is supported.

Example

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
template <typename dataType, bool isReuseSource = false>
__aicore__ inline void MainGroupnormTest(GM_ADDR inputXGm, GM_ADDR gammGm, GM_ADDR betaGm, GM_ADDR outputGm,
    uint32_t n, uint32_t c, uint32_t h, uint32_t w, uint32_t g)
{
    dataType epsilon = 0.001;
    DataFormat dataFormat = DataFormat::ND;

    GlobalTensor<dataType> inputXGlobal;
    GlobalTensor<dataType> gammGlobal;
    GlobalTensor<dataType> betaGlobal;
    GlobalTensor<dataType> outputGlobal;
    uint32_t bshLength = n*c*h*w;
    uint32_t bsLength = g*n;

    inputXGlobal.SetGlobalBuffer(reinterpret_cast<__gm__ dataType*>(inputXGm), bshLength);
    gammGlobal.SetGlobalBuffer(reinterpret_cast<__gm__ dataType*>(gammGm), c);
    betaGlobal.SetGlobalBuffer(reinterpret_cast<__gm__ dataType*>(betaGm), c);
    outputGlobal.SetGlobalBuffer(reinterpret_cast<__gm__ dataType*>(outputGm), bshLength);

    TPipe pipe;
    TQue<QuePosition::VECIN, 1>  inQueueX;
    TQue<QuePosition::VECIN, 1>  inQueueGamma;
    TQue<QuePosition::VECIN, 1>  inQueueBeta;
    TQue<QuePosition::VECOUT, 1> outQueue;
    TBuf<QuePosition::VECCALC> meanBuffer, varBuffer;

    uint32_t hwAlignSize = (sizeof(dataType) * h * w + ONE_BLK_SIZE - 1) / ONE_BLK_SIZE * ONE_BLK_SIZE / sizeof(dataType);
    pipe.InitBuffer(inQueueX, 1, sizeof(dataType) * n * c * hwAlignSize);
    pipe.InitBuffer(inQueueGamma, 1, (sizeof(dataType) * c + 31) / 32 * 32);
    pipe.InitBuffer(inQueueBeta, 1, (sizeof(dataType) * c + 31) / 32 * 32);
    pipe.InitBuffer(outQueue, 1, sizeof(dataType) * n * c * hwAlignSize);
    pipe.InitBuffer(meanBuffer, (sizeof(dataType) * g * n + 31) / 32 * 32);
    pipe.InitBuffer(varBuffer, (sizeof(dataType) * g * n + 31) / 32 * 32);

    LocalTensor<dataType> inputXLocal = inQueueX.AllocTensor<dataType>();
    LocalTensor<dataType> gammaLocal = inQueueGamma.AllocTensor<dataType>();
    LocalTensor<dataType> betaLocal = inQueueBeta.AllocTensor<dataType>();
    LocalTensor<dataType> outputLocal = outQueue.AllocTensor<dataType>();
    LocalTensor<dataType> meanLocal = meanBuffer.Get<dataType>();
    LocalTensor<dataType> varianceLocal = varBuffer.Get<dataType>();

    DataCopyParams copyParams{static_cast<uint16_t>(n*c), static_cast<uint16_t>(h*w*sizeof(dataType)), 0, 0};
    DataCopyPadParams padParams{true, 0, static_cast<uint8_t>(hwAlignSize - h * w), 0};
    DataCopyPad(inputXLocal, inputXGlobal, copyParams, padParams);
    DataCopyParams copyParamsGamma{1, static_cast<uint16_t>(c*sizeof(dataType)), 0, 0};
    DataCopyPadParams padParamsGamma{false, 0, 0, 0};
    DataCopyPad(gammaLocal, gammGlobal, copyParamsGamma, padParamsGamma);
    DataCopyPad(betaLocal, betaGlobal, copyParamsGamma, padParamsGamma);

    PipeBarrier<PIPE_ALL>();

    uint32_t stackBufferSize = 0;
    {
        LocalTensor<float> stackBuffer;
        bool ans = PopStackBuffer<float, TPosition::LCM>(stackBuffer);
        stackBufferSize = stackBuffer.GetSize();
    }

    GroupNormTiling groupNormTiling;
    uint32_t inputShape[4] = {n, c, h, w};
    ShapeInfo shapeInfo{ (uint8_t)4, inputShape, (uint8_t)4, inputShape, dataFormat };

    GetGroupNormNDTillingInfo(shapeInfo, stackBufferSize, sizeof(dataType), isReuseSource, g, groupNormTiling);

    GroupNorm<dataType, isReuseSource>(outputLocal, meanLocal, varianceLocal, inputXLocal, gammaLocal, betaLocal, (dataType)epsilon, groupNormTiling);
    PipeBarrier<PIPE_ALL>();

    DataCopyPad(outputGlobal, outputLocal, copyParams);
    inQueueX.FreeTensor(inputXLocal);
    inQueueGamma.FreeTensor(gammaLocal);
    inQueueBeta.FreeTensor(betaLocal);
    outQueue.FreeTensor(outputLocal);
    PipeBarrier<PIPE_ALL>();
}