BilinearInterpolation (ISASI)

Function Usage

Functions are classified into horizontal iteration and vertical iteration. In each horizontal iteration, eight offset values are read from src0Offset in sequence, indicating the offset of src0. Each offset value points to the start address of a data block in src0. If repeatMode is set to false, a value is obtained from src1 and multiplied by each value in eight data blocks in src0. If repeatMode is set to true, eight values are obtained from src1 and multiplied by the values in the eight data blocks in src0 in sequence. The dst result of the current iteration and the previous dst result are accumulated by data block and stored in the destination address, the dst address remains unchanged in the same horizontal iteration. Then, vertical iteration is performed. The dst start address of vertical iteration is the dst start address of the previous vertical iteration plus vROffset. The dst space occupied by this round of vertical iteration is the eight blocks after the dst start address. In each round of vertical iteration, hRepeat horizontal iterations are performed.

Prototype

  • Bitwise mask mode:
    1
    2
    template <typename T>
    __aicore__ inline void BilinearInterpolation(const LocalTensor<T> &dstLocal, const LocalTensor<T> &src0Local, const LocalTensor<uint32_t> &src0OffsetLocal, const LocalTensor<T> &src1Local, uint64_t mask, uint8_t hRepeat, bool repeatMode, uint16_t dstBlkStride, uint16_t vROffset, uint8_t vRepeat, const LocalTensor<uint8_t> &sharedTmpBuffer)
    
  • Contiguous mask mode:
    1
    2
    template <typename T>
    __aicore__ inline void BilinearInterpolation(const LocalTensor<T> &dstLocal, const LocalTensor<T> &src0Local, const LocalTensor<uint32_t> &src0OffsetLocal, const LocalTensor<T> &src1Local, uint64_t mask[], uint8_t hRepeat, bool repeatMode, uint16_t dstBlkStride, uint16_t vROffset, uint8_t vRepeat, const LocalTensor<uint8_t> &sharedTmpBuffer)
    

Parameters

Table 1 Parameters

Parameter

Input/Output

Description

dstLocal

Output

Destination operand.

The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT.

The start address of the LocalTensor must be 32-byte aligned.

For the Atlas A2 training products / Atlas A2 inference products , the supported data type is half.

For the Atlas A3 training products / Atlas A3 inference products , the supported data type is half.

For the Atlas inference product 's AI Core, the supported data type is half.

src0Local and src1Local

Input

Source operand.

The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT.

The start address of the LocalTensor must be 32-byte aligned.

The source operand must have the same data type as the destination operand.

For the Atlas A2 training products / Atlas A2 inference products , the supported data type is half.

For the Atlas A3 training products / Atlas A3 inference products , the supported data type is half.

For the Atlas inference product 's AI Core, the supported data type is half.

src0OffsetLocal

Input

Source operand.

The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT.

The start address of the LocalTensor must be 32-byte aligned.

For the Atlas A2 training products / Atlas A2 inference products , the supported data type is uint32_t.

For the Atlas A3 training products / Atlas A3 inference products , the supported data type is uint32_t.

For the Atlas inference product 's AI Core, the supported data type is uint32_t.

mask

Input

The mask parameter is used to control the elements involved in computation in each iteration.

  • Bitwise mode: controls the elements that participate in computation by bit. If a bit is set to 1, the corresponding element participates in the computation. If a bit is set to 0, the corresponding element is masked in the computation.

    The mask is in array form. The array length and the value range of the array elements are related to the data type of the operand. When the operand is 16-bit, the array length is 2. In this case, mask[0] and mask[1] must be in the range of [0, 264 – 1] and cannot be 0 at the same time. When the operand is 32-bit, the array length is 1. In this case, mask[0] must be in the range of (0, 264 – 1]. When the operand is 64-bit, the array length is 1. In this case, mask[0] must be in the range of (0, 232 – 1].

    For example, if mask = [0, 8] and 8 = 0b1000, only the fourth element participates in computation.

  • Contiguous mode: indicates the number of contiguous elements that participate in computation. The value range is related to the operand data type. The maximum number of elements that can be processed in each repeat varies according to the data type. When the operand is 16-bit, mask ∈ [1, 128]. When the operand is 32-bit, mask ∈ [1, 64]. When the operand is 64-bit, mask ∈ [1, 32].

hRepeat

Input

Number of horizontal iterations. The value range is [1, 255].

repeatMode

Input

An immediate of type int, specifying the repeat mode. The value range is [0, 1].

  • 0: Each value in the eight data blocks read by src0 in each iteration is multiplied by a single value in src1.
  • 1: Each data block of src0 is multiplied by a single element of src1 every iteration. A total of eight blocks and eight elements are consumed.

dstBlkStride

Input

Address stride of the destination operand between different data blocks in a single repeat, in the unit of 32 bytes.

vROffset

Input

Address offset of the destination operand between vertical repeats, in the unit of elements. The value range is [128, 65535].

vRepeat

Input

Number of vertical iterations. The value range is [1, 255].

sharedTmpBuffer

Input

Temporary space.

For the Atlas A2 training products / Atlas A2 inference products , allocate at least src0Local.GetSize() * 32 + src1Local.GetSize() * 32 bytes.

For the Atlas A3 training products / Atlas A3 inference products , allocate at least src0Local.GetSize() * 32 + src1Local.GetSize() * 32 bytes.

For the Atlas inference product 's AI Core, allocate at least src0OffsetLocal.GetSize() * sizeof(uint32_t) bytes.

Returns

None

Availability

Atlas A2 training products / Atlas A2 inference products

Atlas A3 training products / Atlas A3 inference products

Atlas inference product 's AI Core

Constraints

  • The addresses of src0Local, src1Local, and srcOffsetLocal cannot overlap. In addition, the destination addresses of two vertical repeats cannot overlap.

Examples

  • API example - contiguous mask mode
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    AscendC::LocalTensor<half> dstLocal, src0Local, src1Local;
    AscendC::LocalTensor<uint32_t> src0OffsetLocal;
    AscendC::LocalTensor<uint8_t> tmpLocal;
    uint64_t mask = 128;        // Continuous mask mode
    uint8_t hRepeat = 2;        // Two horizontal iterations
    bool repeatMode = false;    // Iteration mode
    uint16_t dstBlkStride = 1;  // Data is continuously written in a single iteration.
    uint16_t vROffset = 128;    // Data is continuously written between adjacent iterations.
    uint8_t vRepeat = 2;        // Two vertical iterations
    
    AscendC::BilinearInterpolation(dstLocal, src0Local, src0OffsetLocal, src1Local, mask, hRepeat, repeatMode,
                dstBlkStride, vROffset, vRepeat, tmpLocal);
    
  • API example - bitwise mask mode
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    AscendC::LocalTensor<half> dstLocal, src0Local, src1Local;
    AscendC::LocalTensor<uint32_t> src0OffsetLocal;
    AscendC::LocalTensor<uint8_t> tmpLocal;
    uint64_t mask[2] = { UINT64_MAX, UINT64_MAX};  // Bitwise mask mode
    uint8_t hRepeat = 2;        // Two horizontal iterations
    bool repeatMode = false;    // Iteration mode
    uint16_t dstBlkStride = 1;  // Data is continuously written in a single iteration.
    uint16_t vROffset = 128;    // Data is continuously written between adjacent iterations.
    uint8_t vRepeat = 2;        // Two vertical iterations
    
    AscendC::BilinearInterpolation(dstLocal, src0Local, src0OffsetLocal, src1Local, mask, hRepeat, repeatMode,
                dstBlkStride, vROffset, vRepeat, tmpLocal);
    
Result example:
Input (src0Local,half): [1, 2, 3, ..., 512]
Input (src1Local,half): [2, 3, 4, ..., 17]
Input (src0OffsetLocal,uint32_t): [0, 32, 64, ..., 992]
Output (dstLocal,half): [389, 394, 399, 404, ..., 4096]