GroupNorm

Function Usage

A general formula for normalizing a feature is as follows:

$\text{[math]}$

In this formula, $\text{[math]}$ indicates the index of the feature, $\text{[math]}$ and $\text{[math]}$ indicate the values of the feature before and after normalization respectively. $\text{[math]}$ and $\text{[math]}$ indicate the mean value and standard deviation of the feature, and their formulas are as follows:

$\text{[math]}$ $\text{[math]}$

$\text{[math]}$ is a small constant, $\text{[math]}$ indicates the set of data involved in the calculation, and $\text{[math]}$ indicates the set size. The main difference between different types of feature standardization methods (such as BatchNorm, LayerNorm, InstanceNorm, and GroupNorm) lies in the selection of data sets involved in calculation. The following describes how to select data sets for different Norm operators for computation.

For an input with the shape of [N, C, H, W], GroupNorm divides each [C, H, W] into groupNum groups in the C dimension and then normalizes each group. Finally, the standardized feature is scaled and translated. The scaling parameter and the translation parameter $\text{[math]}$ are trainable.

$\text{[math]}$

Prototype

Allocate the temporary space through the API framework.

        
             template <typename T, bool isReuseSource = false>
__aicore__ inline void GroupNorm(const LocalTensor<T>& output, const LocalTensor<T>& outputMean, const LocalTensor<T>& outputVariance, const LocalTensor<T>& inputX, const LocalTensor<T>& gamma, const LocalTensor<T>& beta, const T epsilon, GroupNormTiling& tiling)

Pass the temporary space through the sharedTmpBuffer input parameter.

        
             template <typename T, bool isReuseSource = false>
__aicore__ inline void GroupNorm(const LocalTensor<T>& output, const LocalTensor<T>& outputMean, const LocalTensor<T>& outputVariance, const LocalTensor<T>& inputX, const LocalTensor<T>& gamma, const LocalTensor<T>& beta, const LocalTensor<uint8_t>& sharedTmpBuffer, const T epsilon, GroupNormTiling& tiling)

Parameters

**Table 1** Parameters in the template
Parameter	Description
T	Data type of the operand.
isReuseSource	Whether the source operand can be modified. The default value is false. If you are allowed to modify the source operand, enable this parameter, to save some memory space. If this parameter is set to true, the inputX memory space is reused during internal computation of this API to save the memory space. If this parameter is set to false, the inputX memory space is not reused during internal computation of this API. This parameter can be enabled for float input data but cannot be enabled for half input data. For details about how to use isReuseSource, see More Examples.

**Table 2** API parameters
Parameter	Input/Output	Description
output	Output	Destination operand, which is the result of scaling and translation calculation on the standardized input. The shape is [N, C, H, W]. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT.
outputMean	Output	Destination operand, which indicates the mean value. The shape is [N, groupNum]. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT.
outputVariance	Output	Destination operand, which indicates the variance. The shape is [N, groupNum]. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT.
inputX	Input	Source operand. The shape is [N, C, H, W]. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT.
gamm	Input	Source operand, which indicates the scaling parameter. The shape is [C]. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT.
beta	Input	Source operand, which indicates the translation parameter. The shape is [C]. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT.
sharedTmpBuffer	Input	This parameter is used to store intermediate variables during complex internal API computation and is provided by developers. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT. For details about how to obtain the temporary space size (BufferSize), see GroupNorm Tiling.
epsilon	Input	Weight coefficient for preventing division by zero. The data type must be the same as that of inputX or output.
tilling	Input	Tiling information of input data. For details about how to obtain the tiling information, see GroupNorm Tiling.

Returns

None

Availability

Precautions

For details about the alignment requirements of the operand address offset, see General Restrictions.
Currently, only the ND format is supported.

Example

      
       
         
         
           template <typename dataType, bool isReuseSource = false>
__aicore__ inline void MainGroupnormTest(GM_ADDR inputXGm, GM_ADDR gammGm, GM_ADDR betaGm, GM_ADDR outputGm,
    uint32_t n, uint32_t c, uint32_t h, uint32_t w, uint32_t g)
{
    dataType epsilon = 0.001;
    DataFormat dataFormat = DataFormat::ND;

    GlobalTensor<dataType> inputXGlobal;
    GlobalTensor<dataType> gammGlobal;
    GlobalTensor<dataType> betaGlobal;
    GlobalTensor<dataType> outputGlobal;
    uint32_t bshLength = n*c*h*w;
    uint32_t bsLength = g*n;

    inputXGlobal.SetGlobalBuffer(reinterpret_cast<__gm__ dataType*>(inputXGm), bshLength);
    gammGlobal.SetGlobalBuffer(reinterpret_cast<__gm__ dataType*>(gammGm), c);
    betaGlobal.SetGlobalBuffer(reinterpret_cast<__gm__ dataType*>(betaGm), c);
    outputGlobal.SetGlobalBuffer(reinterpret_cast<__gm__ dataType*>(outputGm), bshLength);

    TPipe pipe;
    TQue<QuePosition::VECIN, 1>  inQueueX;
    TQue<QuePosition::VECIN, 1>  inQueueGamma;
    TQue<QuePosition::VECIN, 1>  inQueueBeta;
    TQue<QuePosition::VECOUT, 1> outQueue;
    TBuf<QuePosition::VECCALC> meanBuffer, varBuffer;

    uint32_t hwAlignSize = (sizeof(dataType) * h * w + ONE_BLK_SIZE - 1) / ONE_BLK_SIZE * ONE_BLK_SIZE / sizeof(dataType);
    pipe.InitBuffer(inQueueX, 1, sizeof(dataType) * n * c * hwAlignSize);
    pipe.InitBuffer(inQueueGamma, 1, (sizeof(dataType) * c + 31) / 32 * 32);
    pipe.InitBuffer(inQueueBeta, 1, (sizeof(dataType) * c + 31) / 32 * 32);
    pipe.InitBuffer(outQueue, 1, sizeof(dataType) * n * c * hwAlignSize);
    pipe.InitBuffer(meanBuffer, (sizeof(dataType) * g * n + 31) / 32 * 32);
    pipe.InitBuffer(varBuffer, (sizeof(dataType) * g * n + 31) / 32 * 32);

    LocalTensor<dataType> inputXLocal = inQueueX.AllocTensor<dataType>();
    LocalTensor<dataType> gammaLocal = inQueueGamma.AllocTensor<dataType>();
    LocalTensor<dataType> betaLocal = inQueueBeta.AllocTensor<dataType>();
    LocalTensor<dataType> outputLocal = outQueue.AllocTensor<dataType>();
    LocalTensor<dataType> meanLocal = meanBuffer.Get<dataType>();
    LocalTensor<dataType> varianceLocal = varBuffer.Get<dataType>();

    DataCopyParams copyParams{static_cast<uint16_t>(n*c), static_cast<uint16_t>(h*w*sizeof(dataType)), 0, 0};
    DataCopyPadParams padParams{true, 0, static_cast<uint8_t>(hwAlignSize - h * w), 0};
    DataCopyPad(inputXLocal, inputXGlobal, copyParams, padParams);
    DataCopyParams copyParamsGamma{1, static_cast<uint16_t>(c*sizeof(dataType)), 0, 0};
    DataCopyPadParams padParamsGamma{false, 0, 0, 0};
    DataCopyPad(gammaLocal, gammGlobal, copyParamsGamma, padParamsGamma);
    DataCopyPad(betaLocal, betaGlobal, copyParamsGamma, padParamsGamma);

    PipeBarrier<PIPE_ALL>();

    uint32_t stackBufferSize = 0;
    {
        LocalTensor<float> stackBuffer;
        bool ans = PopStackBuffer<float, TPosition::LCM>(stackBuffer);
        stackBufferSize = stackBuffer.GetSize();
    }

    GroupNormTiling groupNormTiling;
    uint32_t inputShape[4] = {n, c, h, w};
    ShapeInfo shapeInfo{ (uint8_t)4, inputShape, (uint8_t)4, inputShape, dataFormat };

    GetGroupNormNDTillingInfo(shapeInfo, stackBufferSize, sizeof(dataType), isReuseSource, g, groupNormTiling);

    GroupNorm<dataType, isReuseSource>(outputLocal, meanLocal, varianceLocal, inputXLocal, gammaLocal, betaLocal, (dataType)epsilon, groupNormTiling);
    PipeBarrier<PIPE_ALL>();

    DataCopyPad(outputGlobal, outputLocal, copyParams);
    inQueueX.FreeTensor(inputXLocal);
    inQueueGamma.FreeTensor(gammaLocal);
    inQueueBeta.FreeTensor(betaLocal);
    outQueue.FreeTensor(outputLocal);
    PipeBarrier<PIPE_ALL>();
}

          

        

      
     

Parent topic: Data Normalization