BilinearInterpolation (ISASI)

Function Usage

Functions are classified into horizontal iteration and vertical iteration. In each horizontal iteration, eight offset values are read from src0Offset in sequence, indicating the offset of src0. Each offset value points to the start address of a data block in src0. If repeatMode is set to false, a value is obtained from src1 and multiplied by each value in eight data blocks in src0. If repeatMode is set to true, eight values are obtained from src1 and multiplied by the values in the eight data blocks in src0 in sequence. The dst result of the current iteration and the previous dst result are accumulated by data block and stored in the destination address, the dst address remains unchanged in the same horizontal iteration. Then, vertical iteration is performed. The dst start address of vertical iteration is the dst start address of the previous vertical iteration plus vROffset. The dst space occupied by this round of vertical iteration is the eight blocks after the dst start address. In each round of vertical iteration, hRepeat horizontal iterations are performed.

Prototype

Bitwise mask mode:

        
             template <typename T>
__aicore__ inline void BilinearInterpolation(const LocalTensor<T> &dstLocal, const LocalTensor<T> &src0Local, const LocalTensor<uint32_t> &src0OffsetLocal, const LocalTensor<T> &src1Local, uint64_t mask, uint8_t hRepeat, bool repeatMode, uint16_t dstBlkStride, uint16_t vROffset, uint8_t vRepeat, const LocalTensor<uint8_t> &sharedTmpBuffer)

Contiguous mask mode:

        
             template <typename T>
__aicore__ inline void BilinearInterpolation(const LocalTensor<T> &dstLocal, const LocalTensor<T> &src0Local, const LocalTensor<uint32_t> &src0OffsetLocal, const LocalTensor<T> &src1Local, uint64_t mask[], uint8_t hRepeat, bool repeatMode, uint16_t dstBlkStride, uint16_t vROffset, uint8_t vRepeat, const LocalTensor<uint8_t> &sharedTmpBuffer)

Parameters

**Table 1** Parameters
Parameter	Input/Output	Description
dstLocal	Output	Destination operand. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT. The start address of the LocalTensor must be 32-byte aligned. For the Atlas A2 training products / Atlas A2 inference products , the supported data type is half. For the Atlas A3 training products / Atlas A3 inference products , the supported data type is half. For the Atlas inference product 's AI Core, the supported data type is half.
src0Local and src1Local	Input	Source operand. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT. The start address of the LocalTensor must be 32-byte aligned. The source operand must have the same data type as the destination operand. For the Atlas A2 training products / Atlas A2 inference products , the supported data type is half. For the Atlas A3 training products / Atlas A3 inference products , the supported data type is half. For the Atlas inference product 's AI Core, the supported data type is half.
src0OffsetLocal	Input	Source operand. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT. The start address of the LocalTensor must be 32-byte aligned. For the Atlas A2 training products / Atlas A2 inference products , the supported data type is uint32_t. For the Atlas A3 training products / Atlas A3 inference products , the supported data type is uint32_t. For the Atlas inference product 's AI Core, the supported data type is uint32_t.
mask	Input	mask is used to control the elements that participate in computation in each iteration. Bitwise mode: controls which elements are involved in computation bit by bit. A bit value of 1 means the corresponding element participates in computation, while 0 means it does not. The mask value is an array. The array length and the value range of the array elements are related to the operand data type. When the operand is 16-bit, the array length is 2, mask[0] and mask[1] ∈ [0, 2⁶⁴ -1] and cannot be 0 at the same time. When the operand is 32-bit, the array length is 1 and mask[0] ∈ (0, 2⁶⁴ – 1]. When the operand is 64-bit, the array length is 1 and mask[0] ∈ (0, 2³² – 1]. For example, if mask = [0, 8] and 8 = 0b1000, only the fourth element participates in computation. Contiguous mode: indicates the number of contiguous elements that participate in computation. The value range is related to the operand data type. The maximum number of elements that can be processed in each repeat varies according to the data type. When the operand is 16-bit, mask ∈ [1, 128]. When the operand is 32-bit, mask ∈ [1, 64]. When the operand is 64-bit, mask ∈ [1, 32].
hRepeat	Input	Number of horizontal iterations. The value range is [1, 255].
repeatMode	Input	An immediate of type int, specifying the repeat mode. The value range is [0, 1]. 0: Each value in the eight data blocks read by src0 in each iteration is multiplied by a single value in src1. 1: Each data block of src0 is multiplied by a single element of src1 every iteration. A total of eight blocks and eight elements are consumed.
dstBlkStride	Input	Address stride of the destination operand between different data blocks in a single repeat, in the unit of 32 bytes.
vROffset	Input	Address offset of the destination operand between vertical repeats, in the unit of elements. The value range is [128, 65535].
vRepeat	Input	Number of vertical iterations. The value range is [1, 255].
sharedTmpBuffer	Input	Temporary space. For the Atlas A2 training products / Atlas A2 inference products , allocate at least src0Local.GetSize() * 32 + src1Local.GetSize() * 32 bytes. For the Atlas A3 training products / Atlas A3 inference products , allocate at least src0Local.GetSize() * 32 + src1Local.GetSize() * 32 bytes. For the Atlas inference product 's AI Core, allocate at least src0OffsetLocal.GetSize() * sizeof(uint32_t) bytes.

Returns

None

Availability

Atlas A2 training products / Atlas A2 inference products

Atlas A3 training products / Atlas A3 inference products

Atlas inference product 's AI Core

Constraints

The addresses of src0Local, src1Local, and srcOffsetLocal cannot overlap. In addition, the destination addresses of two vertical repeats cannot overlap.

For details about the operand address alignment requirements, see General Address Alignment Restrictions.

Examples

API example - contiguous mask mode

        
             AscendC::LocalTensor<half> dstLocal, src0Local, src1Local;
AscendC::LocalTensor<uint32_t> src0OffsetLocal;
AscendC::LocalTensor<uint8_t> tmpLocal;
uint64_t mask = 128;        // Continuous mask mode
uint8_t hRepeat = 2;        // Two horizontal iterations
bool repeatMode = false;    // Iteration mode
uint16_t dstBlkStride = 1;  // Data is continuously written in a single iteration.
uint16_t vROffset = 128;    // Data is continuously written between adjacent iterations.
uint8_t vRepeat = 2;        // Two vertical iterations

AscendC::BilinearInterpolation(dstLocal, src0Local, src0OffsetLocal, src1Local, mask, hRepeat, repeatMode,
            dstBlkStride, vROffset, vRepeat, tmpLocal);

API example - bitwise mask mode

        
             AscendC::LocalTensor<half> dstLocal, src0Local, src1Local;
AscendC::LocalTensor<uint32_t> src0OffsetLocal;
AscendC::LocalTensor<uint8_t> tmpLocal;
uint64_t mask[2] = { UINT64_MAX, UINT64_MAX};  // Bitwise mask mode
uint8_t hRepeat = 2;        // Two horizontal iterations
bool repeatMode = false;    // Iteration mode
uint16_t dstBlkStride = 1;  // Data is continuously written in a single iteration.
uint16_t vROffset = 128;    // Data is continuously written between adjacent iterations.
uint8_t vRepeat = 2;        // Two vertical iterations

AscendC::BilinearInterpolation(dstLocal, src0Local, src0OffsetLocal, src1Local, mask, hRepeat, repeatMode,
            dstBlkStride, vROffset, vRepeat, tmpLocal);

Result example:

Input (src0Local,half): [1, 2, 3, ..., 512]
Input (src1Local,half): [2, 3, 4, ..., 17]
Input (src0OffsetLocal,uint32_t): [0, 32, 64, ..., 992]
Output (dstLocal,half): [389, 394, 399, 404, ..., 4096]

Parent topic: Basic Arithmetic