GroupNorm
Function Usage
A general formula for normalizing a feature is as follows:

In this formula,
indicates the index of the feature,
and
indicate the values of the feature before and after normalization respectively.
and
indicate the mean value and standard deviation of the feature, and their formulas are as follows:


is a small constant,
indicates the set of data involved in the calculation, and
indicates the set size. The main difference between different types of feature standardization methods (such as BatchNorm, LayerNorm, InstanceNorm, and GroupNorm) lies in the selection of data sets involved in calculation. The following describes how to select data sets for different Norm operators for computation.

For an input with the shape of [N, C, H, W], GroupNorm divides each [C, H, W] into groupNum groups in the C dimension and then normalizes each group. Finally, the standardized feature is scaled and translated. The scaling parameter
and the translation parameter
are trainable.

Prototype
- Allocate the temporary space through the API framework.
1 2
template <typename T, bool isReuseSource = false> __aicore__ inline void GroupNorm(const LocalTensor<T>& output, const LocalTensor<T>& outputMean, const LocalTensor<T>& outputVariance, const LocalTensor<T>& inputX, const LocalTensor<T>& gamma, const LocalTensor<T>& beta, const T epsilon, GroupNormTiling& tiling)
- Pass the temporary space through the sharedTmpBuffer input parameter.
1 2
template <typename T, bool isReuseSource = false> __aicore__ inline void GroupNorm(const LocalTensor<T>& output, const LocalTensor<T>& outputMean, const LocalTensor<T>& outputVariance, const LocalTensor<T>& inputX, const LocalTensor<T>& gamma, const LocalTensor<T>& beta, const LocalTensor<uint8_t>& sharedTmpBuffer, const T epsilon, GroupNormTiling& tiling)
Parameters
|
Parameter |
Description |
|---|---|
|
T |
Data type of the operand. |
|
isReuseSource |
Whether the source operand can be modified. The default value is false. If you are allowed to modify the source operand, enable this parameter, to save some memory space. If this parameter is set to true, the inputX memory space is reused during internal computation of this API to save the memory space. If this parameter is set to false, the inputX memory space is not reused during internal computation of this API. This parameter can be enabled for float input data but cannot be enabled for half input data. For details about how to use isReuseSource, see More Examples. |
|
Parameter |
Input/Output |
Description |
|---|---|---|
|
output |
Output |
Destination operand, which is the result of scaling and translation calculation on the standardized input. The shape is [N, C, H, W]. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT. |
|
outputMean |
Output |
Destination operand, which indicates the mean value. The shape is [N, groupNum]. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT. |
|
outputVariance |
Output |
Destination operand, which indicates the variance. The shape is [N, groupNum]. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT. |
|
inputX |
Input |
Source operand. The shape is [N, C, H, W]. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT. |
|
gamm |
Input |
Source operand, which indicates the scaling parameter. The shape is [C]. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT. |
|
beta |
Input |
Source operand, which indicates the translation parameter. The shape is [C]. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT. |
|
sharedTmpBuffer |
Input |
This parameter is used to store intermediate variables during complex internal API computation and is provided by developers. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT. For details about how to obtain the temporary space size (BufferSize), see GroupNorm Tiling. |
|
epsilon |
Input |
Weight coefficient for preventing division by zero. The data type must be the same as that of inputX or output. |
|
tilling |
Input |
Tiling information of input data. For details about how to obtain the tiling information, see GroupNorm Tiling. |
Returns
None
Availability
Precautions
- For details about the alignment requirements of the operand address offset, see General Restrictions.
- Currently, only the ND format is supported.
Example
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 |
template <typename dataType, bool isReuseSource = false> __aicore__ inline void MainGroupnormTest(GM_ADDR inputXGm, GM_ADDR gammGm, GM_ADDR betaGm, GM_ADDR outputGm, uint32_t n, uint32_t c, uint32_t h, uint32_t w, uint32_t g) { dataType epsilon = 0.001; DataFormat dataFormat = DataFormat::ND; GlobalTensor<dataType> inputXGlobal; GlobalTensor<dataType> gammGlobal; GlobalTensor<dataType> betaGlobal; GlobalTensor<dataType> outputGlobal; uint32_t bshLength = n*c*h*w; uint32_t bsLength = g*n; inputXGlobal.SetGlobalBuffer(reinterpret_cast<__gm__ dataType*>(inputXGm), bshLength); gammGlobal.SetGlobalBuffer(reinterpret_cast<__gm__ dataType*>(gammGm), c); betaGlobal.SetGlobalBuffer(reinterpret_cast<__gm__ dataType*>(betaGm), c); outputGlobal.SetGlobalBuffer(reinterpret_cast<__gm__ dataType*>(outputGm), bshLength); TPipe pipe; TQue<QuePosition::VECIN, 1> inQueueX; TQue<QuePosition::VECIN, 1> inQueueGamma; TQue<QuePosition::VECIN, 1> inQueueBeta; TQue<QuePosition::VECOUT, 1> outQueue; TBuf<QuePosition::VECCALC> meanBuffer, varBuffer; uint32_t hwAlignSize = (sizeof(dataType) * h * w + ONE_BLK_SIZE - 1) / ONE_BLK_SIZE * ONE_BLK_SIZE / sizeof(dataType); pipe.InitBuffer(inQueueX, 1, sizeof(dataType) * n * c * hwAlignSize); pipe.InitBuffer(inQueueGamma, 1, (sizeof(dataType) * c + 31) / 32 * 32); pipe.InitBuffer(inQueueBeta, 1, (sizeof(dataType) * c + 31) / 32 * 32); pipe.InitBuffer(outQueue, 1, sizeof(dataType) * n * c * hwAlignSize); pipe.InitBuffer(meanBuffer, (sizeof(dataType) * g * n + 31) / 32 * 32); pipe.InitBuffer(varBuffer, (sizeof(dataType) * g * n + 31) / 32 * 32); LocalTensor<dataType> inputXLocal = inQueueX.AllocTensor<dataType>(); LocalTensor<dataType> gammaLocal = inQueueGamma.AllocTensor<dataType>(); LocalTensor<dataType> betaLocal = inQueueBeta.AllocTensor<dataType>(); LocalTensor<dataType> outputLocal = outQueue.AllocTensor<dataType>(); LocalTensor<dataType> meanLocal = meanBuffer.Get<dataType>(); LocalTensor<dataType> varianceLocal = varBuffer.Get<dataType>(); DataCopyParams copyParams{static_cast<uint16_t>(n*c), static_cast<uint16_t>(h*w*sizeof(dataType)), 0, 0}; DataCopyPadParams padParams{true, 0, static_cast<uint8_t>(hwAlignSize - h * w), 0}; DataCopyPad(inputXLocal, inputXGlobal, copyParams, padParams); DataCopyParams copyParamsGamma{1, static_cast<uint16_t>(c*sizeof(dataType)), 0, 0}; DataCopyPadParams padParamsGamma{false, 0, 0, 0}; DataCopyPad(gammaLocal, gammGlobal, copyParamsGamma, padParamsGamma); DataCopyPad(betaLocal, betaGlobal, copyParamsGamma, padParamsGamma); PipeBarrier<PIPE_ALL>(); uint32_t stackBufferSize = 0; { LocalTensor<float> stackBuffer; bool ans = PopStackBuffer<float, TPosition::LCM>(stackBuffer); stackBufferSize = stackBuffer.GetSize(); } GroupNormTiling groupNormTiling; uint32_t inputShape[4] = {n, c, h, w}; ShapeInfo shapeInfo{ (uint8_t)4, inputShape, (uint8_t)4, inputShape, dataFormat }; GetGroupNormNDTillingInfo(shapeInfo, stackBufferSize, sizeof(dataType), isReuseSource, g, groupNormTiling); GroupNorm<dataType, isReuseSource>(outputLocal, meanLocal, varianceLocal, inputXLocal, gammaLocal, betaLocal, (dataType)epsilon, groupNormTiling); PipeBarrier<PIPE_ALL>(); DataCopyPad(outputGlobal, outputLocal, copyParams); inQueueX.FreeTensor(inputXLocal); inQueueGamma.FreeTensor(gammaLocal); inQueueBeta.FreeTensor(betaLocal); outQueue.FreeTensor(outputLocal); PipeBarrier<PIPE_ALL>(); } |