AscendQuant

Function Usage

Performs quantization by element. For example, quantize the half/float data type to the int8_t data type. The calculation formulas are as follows, where PAR indicates the number of elements that can be processed by the Vector Unit in one iteration, and round indicates the Banker's rounding.

Per_tensor quantization: src_local corresponds to a quantization parameter whose shape is (1,).
$\text{[math]}$
Per_channel quantization: The shape of src_local is (m, n), and each channel dimension corresponds to a quantization parameter whose shape is (n,).
$\text{[math]}$

Principles

Figure 1 AscendQuant algorithm block diagram, where both scale and offset are scalars

Figure 2 AscendQuant algorithm block diagram, where both scale and offset are tensors

Figure 3 AscendQuant algorithm block diagram, where scale is a tensor and offset is a scalar

The preceding figure shows the block diagram of the AscendQuant internal algorithm. The computation process is divided into the following steps, all of which are performed on vectors:

Precision conversion: If the input src, scale, or offset is of the float type, convert it to the half type.
Broadcast: If the input scale or offset is a vector, broadcast it to the same dimension as src.
Scale calculation: If src and scale are vectors, perform Mul calculation. If scale is a scalar, perform Muls calculation to obtain Tmp1.
Offset calculation: If Tmp1 and offset are vectors, perform Add calculation. If offset is a scalar, perform Adds calculation to obtain Tmp2.
Precision conversion: Convert Tmp2 from half to int8_t to obtain the output.

Prototype

dstTensor of the int8_t type

Per_tensor quantization:

Pass the temporary space through the sharedTmpBuffer input parameter.

All or part of the source operand tensors are involved in computation.

              
                   template <typename T, bool isReuseSource = false>
__aicore__ inline void AscendQuant(const LocalTensor<int8_t>& dstTensor, const LocalTensor<T>& srcTensor, const LocalTensor<uint8_t>& sharedTmpBuffer, const float scale, const float offset, const uint32_t calCount)

All source operand tensors are involved in computation.

              
                   template <typename T, bool isReuseSource = false>
__aicore__ inline void AscendQuant(const LocalTensor<int8_t>& dstTensor, const LocalTensor<T>& srcTensor, const LocalTensor<uint8_t>& sharedTmpBuffer, const float scale, const float offset)

Allocate the temporary space through the API framework.

All or part of the source operand tensors are involved in computation.

              
                   template <typename T, bool isReuseSource = false>
__aicore__ inline void AscendQuant(const LocalTensor<int8_t>& dstTensor, const LocalTensor<T>& srcTensor, const float scale, const float offset, const uint32_t calCount)

All source operand tensors are involved in computation.

              
                   template <typename T, bool isReuseSource = false>
__aicore__ inline void AscendQuant(const LocalTensor<int8_t>& dstTensor, const LocalTensor<T>& srcTensor, const float scale, const float offset)

Per_channel quantization:

Pass the temporary space through the sharedTmpBuffer input parameter.

All or part of the source operand tensors are involved in computation.

              
                   template <typename T, bool isReuseSource = false>
__aicore__ inline void AscendQuant(const LocalTensor<int8_t>& dstTensor, const LocalTensor<T>& srcTensor, const LocalTensor<uint8_t>& sharedTmpBuffer, const LocalTensor<T>& scaleTensor, const T offset, const uint32_t scaleCount, const uint32_t calCount)

              
                   template <typename T, bool isReuseSource = false>
__aicore__ inline void AscendQuant(const LocalTensor<int8_t>& dstTensor, const LocalTensor<T>& srcTensor, const LocalTensor<uint8_t>& sharedTmpBuffer, const LocalTensor<T>& scaleTensor, const LocalTensor<T>& offsetTensor, const uint32_t scaleCount, const uint32_t offsetCount, const uint32_t calCount)

All source operand tensors are involved in computation.

              
                   template <typename T, bool isReuseSource = false>
__aicore__ inline void AscendQuant(const LocalTensor<int8_t>& dstTensor, const LocalTensor<T>& srcTensor, const LocalTensor<uint8_t>& sharedTmpBuffer, const LocalTensor<T>& scaleTensor, const T offset)

              
                   template <typename T, bool isReuseSource = false>
__aicore__ inline void AscendQuant(const LocalTensor<int8_t>& dstTensor, const LocalTensor<T>& srcTensor, const LocalTensor<uint8_t>& sharedTmpBuffer, const LocalTensor<T>& scaleTensor, const LocalTensor<T>& offsetTensor)

Allocate the temporary space through the API framework.

All or part of the source operand tensors are involved in computation.

              
                   template <typename T, bool isReuseSource = false>
__aicore__ inline void AscendQuant(const LocalTensor<int8_t>& dstTensor, const LocalTensor<T>& srcTensor, const LocalTensor<T>& scaleTensor, const T offset, const uint32_t scaleCount, const uint32_t calCount)

              
                   template <typename T, bool isReuseSource = false>
__aicore__ inline void AscendQuant(const LocalTensor<int8_t>& dstTensor, const LocalTensor<T>& srcTensor, const LocalTensor<T>& scaleTensor, const LocalTensor<T>& offsetTensor, const uint32_t scaleCount, const uint32_t offsetCount, const uint32_t calCount)

All source operand tensors are involved in computation.

              
                   template <typename T, bool isReuseSource = false>
__aicore__ inline void AscendQuant(const LocalTensor<int8_t>& dstTensor, const LocalTensor<T>& srcTensor, const LocalTensor<T>& scaleTensor, const T offset)

              
                   template <typename T, bool isReuseSource = false>
__aicore__ inline void AscendQuant(const LocalTensor<int8_t>& dstTensor, const LocalTensor<T>& srcTensor, const LocalTensor<T>& scaleTensor, const LocalTensor<T>& offsetTensor)

Due to the complex mathematical computation involved in the internal implementation of this API, additional temporary space is required to store intermediate variables generated during computation. The temporary space can be allocated through the API framework or passed by developers through the sharedTmpBuffer input parameter.

When the API framework is used for temporary space allocation, developers do not need to allocate the space, but must reserve the required size for the space.

When the sharedTmpBuffer input parameter is used for passing the temporary space, the tensor serves as the temporary space. In this case, the API framework is not required for temporary space allocation. This enables developers to manage the sharedTmpBuffer space and reuse the buffer after calling the API, so that the buffer is not repeatedly allocated and deallocated, improving the flexibility and buffer utilization.

If the API framework is used, developers must reserve the temporary space. If sharedTmpBuffer is used, developers must allocate space for sharedTmpBuffer. To obtain the size of the temporary space (BufferSize) to be reserved, use the GetAscendQuantMaxMinTmpSize API provided in GetAscendQuantMaxMinTmpSize.

Parameters

**Table 1** Parameters in the template
Parameter	Description
T	Data type of the operand.
isReuseSource	Whether the source operand can be modified. This parameter is reserved. Pass the default value false.

**Table 2** Per_tensor API parameters
Parameter	Input/Output	Description
dstTensor	Output	Destination operand. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT. For the Atlas Training Series Product , the supported data type is int8_t.
srcTensor	Input	Source operand. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT. For the Atlas Training Series Product , the supported data types are half and float.
sharedTmpBuffer	Input	Temporary buffer. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT. For details about how to obtain the temporary space size (BufferSize), see GetAscendQuantMaxMinTmpSize. For the Atlas Training Series Product , the supported data type is uint8_t.
scale	Input	Quantization parameter. The type is Scalar, and the supported data type is float.
offset	Input	Quantization parameter. The type is Scalar, and the supported data type is float.
calCount	Input	Number of actually computed data elements. The value range is [0, srcTensor.GetSize()].

**Table 3** Per_channel API parameters
Parameter	Input/Output	Description
dstTensor	Output	Destination operand. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT.
srcTensor	Input	Source operand. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT.
sharedTmpBuffer	Input	Temporary buffer. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT. For details about how to obtain the temporary space size (BufferSize), see GetAscendQuantMaxMinTmpSize.
scaleTensor	Input	Quantization parameter. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT.
offsetTensor	Input	Quantization parameter. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT.
scaleCount	Input	Number of actual quantization parameter elements. The value is within a range of [0, scaleTensor.GetSize()] and must be an integer multiple of 32.
offsetCount	Input	Number of actual quantization parameter elements. The value is within a range of [0, offsetTensor.GetSize()] and must be an integer multiple of 32. It is the same as that of scaleCount.
calCount	Input	Number of actually computed data elements. The value is within a range of [0, srcTensor.GetSize()] and must be an integer multiple of scaleCount.

Returns

None

Availability

The Atlas Training Series Product supports only per_tensor quantization and does not support per_channel quantization.

Constraints

The source operand and destination operand can be used simultaneously (that is, address overlapping).
For details about the alignment requirements of the operand address offset, see General Restrictions.
The length of the data involved in computation of the input and output operands must be 32-byte aligned.
When scale is of the float type, its value range is still the range of values of the half type.

Example

For a complete operator example, see quant operator sample.

      
           //The input shape is 1024.
uint32_t dataSize = 1024; 
//The input type is float or half, with scale being 2.0 and offset being 0.9. Temporary space is reserved.
AscendC::AscendQuant<srcType>(dstLocal, srcLocal, 2.0f, 0.9f, dataSize);

Result example:

      
       
         
         
           Input data (srcLocal):
[-3.22      2.09     -2.025    -2.895    -1.349    -3.336     1.376
  2.453     3.861     1.085    -2.273     0.3923    0.3645   -2.127
 -3.09     -0.002726 -2.783     0.2615   -0.904     1.507    -1.017
  3.568     2.219     0.8643    0.922     1.144    -1.853     2.002
 -1.705     1.675    -3.482     1.519     0.4172    0.4307   -1.228
 -2.62      0.3354   -3.586     2.604     1.688    -3.646    -3.389
 -3.918     3.955     0.7954   -2.562    -1.085     2.91     -0.398
  3.771    -2.914     1.726     3.367     3.482     3.49      1.382
  3.512     0.1938   -0.4087   -3.75      2.873    -2.54      1.826
  3.738     3.188     2.676     0.724    -1.108    -2.682    -0.4783
  2.082    -0.462    -2.955    -2.543     3.98     -1.85      3.018
 -2.688     3.596    -0.799     1.222     1.686    -0.7925    3.295
 -3.568    -0.03836  -2.002    -1.212     1.927    -1.11      1.046
  3.793    -0.6226   -3.494    -3.371    -2.354    -1.7      -0.948
  2.682    -3.344     2.566     2.533    -1.335     1.405     3.867
  3.674     1.359     3.145    -1.22      1.054    -2.492    -1.214
  3.879     2.014     2.664    -2.863    -3.88      2.857     1.695
  2.852     2.893     2.367    -0.1832   -3.254    -1.49      1.13
  0.672    -1.863    -3.547     3.281    -1.573    -1.349    -3.547
 -3.766    -2.99     -3.203    -2.703    -2.793    -1.501     0.4785
 -1.216    -1.205     0.9097   -3.438     0.781    -1.505    -1.982
  0.2037    0.4595    0.759     0.844    -3.396     0.4778   -0.899
 -2.342    -0.961    -2.531    -0.10913  -3.516    -3.66      1.337
 -3.44      0.7495    1.958     2.775     0.0968   -3.       -2.13
 -1.818     2.664     2.066    -1.923     2.97     -2.047    -3.598
  0.1661   -0.179     3.186    -1.247     2.777    -3.344    -3.148
  2.275     2.916    -1.081    -3.213     2.87     -3.12     -3.066
 -0.6      -3.78     -3.012    -3.86     -0.707    -0.2203   -3.338
 -2.273     2.062    -2.422    -0.443    -1.333    -2.2      -1.478
 -2.816     1.134     0.2115   -2.459     3.842    -2.768     2.822
  1.3125   -2.143     1.971    -3.543    -0.07794  -0.1265    0.763
 -3.26      3.514     3.629     0.1902    1.277    -0.1652   -0.006435
 -1.25      2.258    -2.887     3.66      2.729    -3.27     -0.5615
 -3.176    -1.2295    1.556    -0.6626   -2.777     1.946    -0.338
 -2.977    -0.8135   -2.37      0.7764    3.525    -0.6196    2.436
  2.38     -1.708     0.814     0.4688   -1.255     1.04     -1.077
  3.176     1.859     0.9194    2.703     1.436     1.762     2.2
  1.794    -1.234    -2.148    -2.393     2.846     1.854     0.3428
 -2.379     0.2429   -1.561     2.582     0.6836    1.811    -2.53
 -3.951    -2.096    -2.639     2.02      2.799    -0.8936   -1.295
 -3.914    -1.82      2.541    -2.773     1.733     3.955    -3.092
  0.04095   0.82     -1.071     3.93     -3.158    -2.5      -0.5415
 -1.98     -0.1626    3.092    -1.3125    3.387    -2.496     2.355
 -3.033    -3.814    -3.191     2.686     1.377     1.381    -3.047
  2.127    -0.4927   -1.718     2.371    -0.1648    1.885    -0.6826
 -3.121    -2.379    -3.959    -2.164     2.262    -2.973     3.092
  2.111    -0.03732   2.836    -2.725     3.436     1.017     2.877
 -2.926     2.547     0.8574    2.643     2.646    -0.889     3.363
 -0.3147   -0.09546   0.0551   -3.947    -1.434    -0.6104   -3.41
 -2.176    -1.866     3.975    -3.031    -1.25      3.918     3.697
  3.21     -2.436    -3.281    -3.225     0.7856    2.043     1.415
 -2.252    -1.648     0.03824  -3.432     0.3271    1.458    -0.02289
 -0.643     1.441    -0.1847    1.062     3.545     0.367     1.796
 -1.687     2.06      0.2373    3.748    -2.752     2.73     -2.693
 -3.54     -2.275    -3.033    -1.622    -3.936     1.295     2.586
 -2.926    -2.314     2.527    -1.619    -0.04037  -3.225     1.771
  3.064    -1.173    -2.324     3.332    -0.8257    1.075    -3.287
  1.075    -2.262     1.419    -0.344    -0.4988    1.113     3.068
 -1.104     2.531     2.645     0.6333    0.3677   -3.186    -0.3726
  2.549    -0.3347    2.227    -3.963    -2.564     3.656     1.069
 -3.684    -1.388    -0.2568   -0.726     0.4883    1.946    -1.579
 -0.8438   -2.014     2.332     0.306    -3.305    -3.588    -1.038
  3.299     0.832     0.8594   -1.163     1.2705    2.018    -3.352
  2.537     2.111    -3.61      0.645    -2.459    -2.469     1.002
 -3.914     1.079    -0.9214   -2.111    -3.88     -0.5254   -1.908
 -1.19      3.559    -3.285    -2.266     3.672     0.001524 -1.964
 -1.742     1.895     3.887     1.737     0.909     0.5044    2.55
  0.8936    2.139    -3.658     1.828    -3.688    -3.26      1.436
 -1.321    -3.19      2.764    -3.305    -2.52     -2.441    -0.32
 -2.402     2.252    -1.527     0.719     0.2328    0.1766   -2.088
  3.729     0.844    -1.174    -0.7427    0.8296   -0.1885   -0.0379
  2.92      2.502     3.846     1.657    -3.58     -3.352    -3.904
 -2.43      1.159    -1.707     2.21      2.367    -0.5864   -1.647
  1.952   ]
Output data (dstLocal):
[-6  5 -3 -5 -2 -6  4  6  9  3 -4  2  2 -3 -5  1 -5  1 -1  4 -1  8  5  3
  3  3 -3  5 -3  4 -6  4  2  2 -2 -4  2 -6  6  4 -6 -6 -7  9  2 -4 -1  7
  0  8 -5  4  8  8  8  4  8  1  0 -7  7 -4  5  8  7  6  2 -1 -4  0  5  0
 -5 -4  9 -3  7 -4  8 -1  3  4 -1  7 -6  1 -3 -2  5 -1  3  8  0 -6 -6 -4
 -2 -1  6 -6  6  6 -2  4  9  8  4  7 -2  3 -4 -2  9  5  6 -5 -7  7  4  7
  7  6  1 -6 -2  3  2 -3 -6  7 -2 -2 -6 -7 -5 -6 -5 -5 -2  2 -2 -2  3 -6
  2 -2 -3  1  2  2  3 -6  2 -1 -4 -1 -4  1 -6 -6  4 -6  2  5  6  1 -5 -3
 -3  6  5 -3  7 -3 -6  1  1  7 -2  6 -6 -5  5  7 -1 -6  7 -5 -5  0 -7 -5
 -7 -1  0 -6 -4  5 -4  0 -2 -3 -2 -5  3  1 -4  9 -5  7  4 -3  5 -6  1  1
  2 -6  8  8  1  3  1  1 -2  5 -5  8  6 -6  0 -5 -2  4  0 -5  5  0 -5 -1
 -4  2  8  0  6  6 -3  3  2 -2  3 -1  7  5  3  6  4  4  5  4 -2 -3 -4  7
  5  2 -4  1 -2  6  2  5 -4 -7 -3 -4  5  6 -1 -2 -7 -3  6 -5  4  9 -5  1
  3 -1  9 -5 -4  0 -3  1  7 -2  8 -4  6 -5 -7 -5  6  4  4 -5  5  0 -3  6
  1  5  0 -5 -4 -7 -3  5 -5  7  5  1  7 -5  8  3  7 -5  6  3  6  6 -1  8
  0  1  1 -7 -2  0 -6 -3 -3  9 -5 -2  9  8  7 -4 -6 -6  2  5  4 -4 -2  1
 -6  2  4  1  0  4  1  3  8  2  4 -2  5  1  8 -5  6 -4 -6 -4 -5 -2 -7  3
  6 -5 -4  6 -2  1 -6  4  7 -1 -4  8 -1  3 -6  3 -4  4  0  0  3  7 -1  6
  6  2  2 -5  0  6  0  5 -7 -4  8  3 -6 -2  0 -1  2  5 -2 -1 -3  6  2 -6
 -6 -1  7  3  3 -1  3  5 -6  6  5 -6  2 -4 -4  3 -7  3 -1 -3 -7  0 -3 -1
  8 -6 -4  8  1 -3 -3  5  9  4  3  2  6  3  5 -6  5 -6 -6  4 -2 -5  6 -6
 -4 -4  0 -4  5 -2  2  1  1 -3  8  3 -1 -1  3  1  1  7  6  9  4 -6 -6 -7
 -4  3 -3  5  6  0 -2  5]

          

        

      
     

Parent topic: Quantization and Dequantization