GroupNorm

Applicability

Product	Supported
Atlas A3 training products/Atlas A3 inference products	√
Atlas A2 training products/Atlas A2 inference products	√
Atlas 200I/500 A2 inference products	x
Atlas inference product's AI Core	x
Atlas inference product's Vector Core	x
Atlas training products	x

Function

A general formula for normalizing a feature is as follows:

$\text{[math]}$

In this formula, i indicates the index of the feature. $\text{[math]}$ and $\text{[math]}$ indicate the values of the feature before and after normalization, respectively. μ and σ indicate the mean value and standard deviation of the feature, and their formulas are as follows:

$\text{[math]}$

ε is a small constant. S indicates the set of data involved in the calculation, and m indicates the set size. The main difference between different types of feature standardization methods (such as BatchNorm, LayerNorm, InstanceNorm, and GroupNorm) lies in the selection of data sets involved in calculation. The following describes how to select data sets for different Norm operators for computation.

For an input with the shape of [N, C, H, W], GroupNorm divides each [C, H, W] into groupNum groups in the C dimension and then normalizes each group. Finally, the standardized feature is scaled and translated. The scaling parameter γ and the translation parameter β are trainable.

$\text{[math]}$

Prototype

Allocate the temporary space through the API framework.

template <typename T, bool isReuseSource = false>
__aicore__ inline void GroupNorm(const LocalTensor<T>& output, const LocalTensor<T>& outputMean, const LocalTensor<T>& outputVariance, const LocalTensor<T>& inputX, const LocalTensor<T>& gamma, const LocalTensor<T>& beta, const T epsilon, GroupNormTiling& tiling)

Pass to the temporary space through the sharedTmpBuffer input parameter.

template <typename T, bool isReuseSource = false>
__aicore__ inline void GroupNorm(const LocalTensor<T>& output, const LocalTensor<T>& outputMean, const LocalTensor<T>& outputVariance, const LocalTensor<T>& inputX, const LocalTensor<T>& gamma, const LocalTensor<T>& beta, const LocalTensor<uint8_t>& sharedTmpBuffer, const T epsilon, GroupNormTiling& tiling)

Parameters

**Table 1** Template parameters
Parameter	Description
T	Data type of the operand. For the Atlas A3 training products/Atlas A3 inference products, the supported data types are half and float. For the Atlas A2 training products/Atlas A2 inference products, the supported data types are half and float.
isReuseSource	Whether the source operand can be modified. The default value is false. If you allow the source operand to be modified, enable this parameter to reduce memory space usage. If this parameter is set to true, the inputX memory space is reused during internal computation of this API to reduce memory space usage. If this parameter is set to false, the inputX memory space is not reused during internal computation of this API. This parameter can be enabled for float data inputs but cannot be enabled for half data inputs. For details about how to use isReuseSource, see Example 4.

**Table 2** API parameters
Parameter	Input/Output	Description
output	Output	Destination operand, which is the result of scaling and translation calculation on the standardized input. The shape is [N, C, H, W]. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT.
outputMean	Output	Destination operand, which indicates the mean. The shape is [N, groupNum]. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT.
outputVariance	Output	Destination operand, which indicates the variance. The shape is [N, groupNum]. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT.
inputX	Input	Source operand. The shape is [N, C, H, W]. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT.
gamma	Input	Source operand, which indicates the scaling parameter. The value range of this parameter is [–100, 100]. The shape is [C]. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT.
beta	Input	Source operand, which indicates the translation parameter. The value range of this parameter is [–100, 100]. The shape is [C]. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT.
sharedTmpBuffer	Input	This parameter is used to store intermediate variables during complex internal API computation and is provided by developers. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT. For details about how to obtain the temporary space size (BufferSize), see GroupNorm Tiling.
epsilon	Input	Weight coefficient for preventing division by zero. The data type must be the same as that of inputX or output.
tiling	Input	Tiling information of input data. For details about how to obtain the tiling information, see GroupNorm Tiling.

Returns

None

Restrictions

For details about the operand address alignment requirements, see General Address Alignment Restrictions.
Currently, only the ND format is supported.

Example

template <typename dataType, bool isReuseSource = false>
__aicore__ inline void MainGroupnormTest(GM_ADDR inputXGm, GM_ADDR gammGm, GM_ADDR betaGm, GM_ADDR outputGm,
    uint32_t n, uint32_t c, uint32_t h, uint32_t w, uint32_t g)
{
    dataType epsilon = 0.001;
    DataFormat dataFormat = DataFormat::ND;

    GlobalTensor<dataType> inputXGlobal;
    GlobalTensor<dataType> gammGlobal;
    GlobalTensor<dataType> betaGlobal;
    GlobalTensor<dataType> outputGlobal;
    uint32_t bshLength = n*c*h*w;
    uint32_t bsLength = g*n;

    inputXGlobal.SetGlobalBuffer(reinterpret_cast<__gm__ dataType*>(inputXGm), bshLength);
    gammGlobal.SetGlobalBuffer(reinterpret_cast<__gm__ dataType*>(gammGm), c);
    betaGlobal.SetGlobalBuffer(reinterpret_cast<__gm__ dataType*>(betaGm), c);
    outputGlobal.SetGlobalBuffer(reinterpret_cast<__gm__ dataType*>(outputGm), bshLength);

    TPipe pipe;
    TQue<TPosition::VECIN, 1>  inQueueX;
    TQue<TPosition::VECIN, 1>  inQueueGamma;
    TQue<TPosition::VECIN, 1>  inQueueBeta;
    TQue<TPosition::VECOUT, 1> outQueue;
    TBuf<TPosition::VECCALC> meanBuffer, varBuffer;

    uint32_t hwAlignSize = (sizeof(dataType) * h * w + ONE_BLK_SIZE - 1) / ONE_BLK_SIZE * ONE_BLK_SIZE / sizeof(dataType);
    pipe.InitBuffer(inQueueX, 1, sizeof(dataType) * n * c * hwAlignSize);
    pipe.InitBuffer(inQueueGamma, 1, (sizeof(dataType) * c + 31) / 32 * 32);
    pipe.InitBuffer(inQueueBeta, 1, (sizeof(dataType) * c + 31) / 32 * 32);
    pipe.InitBuffer(outQueue, 1, sizeof(dataType) * n * c * hwAlignSize);
    pipe.InitBuffer(meanBuffer, (sizeof(dataType) * g * n + 31) / 32 * 32);
    pipe.InitBuffer(varBuffer, (sizeof(dataType) * g * n + 31) / 32 * 32);

    LocalTensor<dataType> inputXLocal = inQueueX.AllocTensor<dataType>();
    LocalTensor<dataType> gammaLocal = inQueueGamma.AllocTensor<dataType>();
    LocalTensor<dataType> betaLocal = inQueueBeta.AllocTensor<dataType>();
    LocalTensor<dataType> outputLocal = outQueue.AllocTensor<dataType>();
    LocalTensor<dataType> meanLocal = meanBuffer.Get<dataType>();
    LocalTensor<dataType> varianceLocal = varBuffer.Get<dataType>();

    DataCopyParams copyParams{static_cast<uint16_t>(n*c), static_cast<uint16_t>(h*w*sizeof(dataType)), 0, 0};
    DataCopyPadParams padParams{true, 0, static_cast<uint8_t>(hwAlignSize - h * w), 0};
    DataCopyPad(inputXLocal, inputXGlobal, copyParams, padParams);
    DataCopyParams copyParamsGamma{1, static_cast<uint16_t>(c*sizeof(dataType)), 0, 0};
    DataCopyPadParams padParamsGamma{false, 0, 0, 0};
    DataCopyPad(gammaLocal, gammGlobal, copyParamsGamma, padParamsGamma);
    DataCopyPad(betaLocal, betaGlobal, copyParamsGamma, padParamsGamma);

    PipeBarrier<PIPE_ALL>();

    uint32_t stackBufferSize = 0;
    {
        LocalTensor<float> stackBuffer;
        bool ans = PopStackBuffer<float, TPosition::LCM>(stackBuffer);
        stackBufferSize = stackBuffer.GetSize();
    }

    GroupNormTiling groupNormTiling;
    uint32_t inputShape[4] = {n, c, h, w};
    ShapeInfo shapeInfo{ (uint8_t)4, inputShape, (uint8_t)4, inputShape, dataFormat };

    GetGroupNormNDTillingInfo(shapeInfo, stackBufferSize, sizeof(dataType), isReuseSource, g, groupNormTiling);

    GroupNorm<dataType, isReuseSource>(outputLocal, meanLocal, varianceLocal, inputXLocal, gammaLocal, betaLocal, (dataType)epsilon, groupNormTiling);
    PipeBarrier<PIPE_ALL>();

    DataCopyPad(outputGlobal, outputLocal, copyParams);
    inQueueX.FreeTensor(inputXLocal);
    inQueueGamma.FreeTensor(gammaLocal);
    inQueueBeta.FreeTensor(betaLocal);
    outQueue.FreeTensor(outputLocal);
    PipeBarrier<PIPE_ALL>();
}

Parent topic: Normalization