Normalize

Applicability

Product	Supported
Atlas A3 training products/Atlas A3 inference products	√
Atlas A2 training products/Atlas A2 inference products	√
Atlas 200I/500 A2 inference products	x
Atlas inference product's AI Core	√
Atlas inference product's Vector Core	x
Atlas training products	x

Function

Computes the reciprocal rstd of the standard deviation of the input data with a shape of [A, R] and y based on the known mean and variance in LayerNorm. The formulas are as follows:

$\text{[math]}$

E and Var respectively represent the mean and variance of an input on the R axis. γ is the scaling coefficient, β is the translation coefficient, and ε is the weight coefficient for preventing division by zero.

Prototype

Pass to the temporary space through the sharedTmpBuffer input parameter.

template < typename U, typename T, bool isReuseSource = false, const NormalizeConfig& config = NLCFG_NORM>
__aicore__ inline void Normalize(const LocalTensor<T>& output, const LocalTensor<float>& outputRstd, const LocalTensor<float>& inputMean, const LocalTensor<float>& inputVariance, const LocalTensor<T>& inputX, const LocalTensor<U>& gamma, const LocalTensor<U>& beta, const LocalTensor<uint8_t>& sharedTmpBuffer, const float epsilon, const NormalizePara& para)

Allocate the temporary space through the API framework.

template < typename U, typename T, bool isReuseSource = false, const NormalizeConfig& config = NLCFG_NORM>
__aicore__ inline void Normalize(const LocalTensor<T>& output, const LocalTensor<float>& outputRstd, const LocalTensor<float>& inputMean, const LocalTensor<float>& inputVariance, const LocalTensor<T>& inputX, const LocalTensor<U>& gamma, const LocalTensor<U>& beta, const float epsilon, const NormalizePara& para)

Due to the complex computation involved in the internal implementation of this API, extra temporary space is required to store intermediate variables generated during computation. The temporary space can be allocated through the API framework or passed by developers through the sharedTmpBuffer input parameter.

When the API framework is used for temporary space allocation, you do not need to allocate the space, but must reserve the required size for the temporary space.

When the sharedTmpBuffer input parameter is used for passing the temporary space, the tensor serves as the temporary space. In this case, the API framework is not required for temporary space allocation. This enables you to manage the sharedTmpBuffer space and reuse the buffer after calling the API, so that the buffer is not repeatedly allocated and deallocated, improving the flexibility and buffer utilization.

If the API framework is used, developers must reserve the temporary space. If sharedTmpBuffer is used, developers must allocate space for the tensor. The method of obtaining the temporary space size (BufferSize) is as follows: Obtain the required maximum and minimum temporary space sizes using the GetNormalizeMaxMinTmpSize API provided in Normalize Tiling. The minimum space can ensure correct functionality, while the maximum space is used to improve performance.

Parameters

Table 1 Template parameters

Parameter

Description

Data type of the beta and gamma operands.

For the Atlas A3 training products/Atlas A3 inference products, the supported data types are half and float.

For the Atlas A2 training products/Atlas A2 inference products, the supported data types are half and float.

For the Atlas inference product's AI Core, the supported data types are half and float.

Data type of the output and inputX operands.

For the Atlas A3 training products/Atlas A3 inference products, the supported data types are half and float.

For the Atlas A2 training products/Atlas A2 inference products, the supported data types are half and float.

For the Atlas inference product's AI Core, the supported data types are half and float.

isReuseSource

This parameter is reserved. Pass the default value false.

config

A parameter used to configure the input and output information of the Normalize API. The NormalizeConfig type is defined as follows:

struct NormalizeConfig {
    ReducePattern reducePattern = ReducePattern::AR;
    int32_t aLength = -1;
    bool isNoBeta = false;
    bool isNoGamma = false;
    bool isOnlyOutput = false;
};

reducePattern: Currently, only the ReducePattern::AR mode is supported, indicating that the input inner R axis is the Reduce axis.
aLength: Size of the input axis A. The following values are supported:
- –1: Default value. The aLength value of API parameter para is used as the A axis size.
- Other values: The value must be the same as the aLength value of API parameter para.
isNoBeta: Whether to use beta in computation.
- false: Default value. The input beta is used in the Normalize computation.
- true: The input beta is not used in the Normalize computation. In this case, computation related to beta in the formula is omitted.
isNoGamma: Whether the optional input gamma is used.
- false: Default value. The optional input gamma is used in Normalize computation.
- true:The input gamma is not used in the Normalize computation. In this case, computation related to gamma in the formula is omitted.
isOnlyOutput: indicates whether to output only y but not the reciprocal rstd of the standard deviation. Currently, this parameter can only be set to false, indicating that all y and rstd results are output.

Table 2 API parameters

Parameter

Input/Output

Description

output

Output

Destination operand, with a shape of [A, R]. For details about the definition of the LocalTensor data structure, see LocalTensor.