ArithProgression

Function Usage

Returns an arithmetic progression given the start value, arithmetical value, and length.

Principles

The figure below illustrates the internal algorithm block diagram of ArithProgression high-level APIs, taking the float type, ND format, and firstValue and diffValue input scalars as examples.

Figure 1 ArithProgression algorithm block diagram

The computation process is divided into the following steps, all of which are performed on vectors:

  1. Step for arithmetic progression with a length less than 8: Use SetValue to expand the arithmetic progression based on the values of firstValue and diffValue. The maximum expanded length is 8. If the length of the arithmetic progression is less than 8, the algorithm ends.
  2. Step for arithmetic progression with a length of 8 to 64: Use Adds to expand the arithmetic progression result in step 1, with a maximum of seven cycles of expansion to reach a length of 64. If the length of the arithmetic progression is less than 64, the algorithm ends.
  3. Step for arithmetic progression with a length greater than 64: Use Adds to expand the arithmetic progression result in step 2, and repeat until the target length of the arithmetic progression is reached.

Prototype

1
2
template <typename T>
__aicore__ inline void ArithProgression(const LocalTensor<T> &dstLocal, const T firstValue, const T diffValue, const int32_t count)

Parameters

Table 1 Parameters in the template

Parameter

Function

T

Data type of the operand.

Table 2 API parameters

Parameter

Input/Output

Description

dstLocal

Output

Destination operand. The size of dstTensor must be greater than or equal to the value of count x sizeof(T).

The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT.

firstValue

Input

Value of the first element in an arithmetic progression.

diffValue

Input

Difference between elements in an arithmetic progression. The value must be greater than or equal to 0.

count

Input

Length of an arithmetic progression. The value of count is greater than 0.

Returns

None

Availability

Precautions

Currently, only the ND format is supported.

Example

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
#include "kernel_operator.h"

template <typename T>
class KernelArithProgression {
public:
    __aicore__ inline KernelArithProgression()
    {}
    __aicore__ inline void Init(GM_ADDR dstGm, int64_t firstValue, int64_t diffValue, uint32_t count)
    {
        firstValue_ = firstValue;
        diffValue_ = diffValue;
        count_ = count;
        dst_global.SetGlobalBuffer(reinterpret_cast<__gm__ T *>(dstGm), count_);
        pipe.InitBuffer(outDst, 1, (sizeof(T) * count_ + 32 - 1) / 32 * 32);
    }
    __aicore__ inline void Process()
    {
        CopyIn();
        Compute();
        CopyOut();
    }

private:
    __aicore__ inline void CopyIn()
    {
        ;
    }
    __aicore__ inline void Compute()
    {
        AscendC::LocalTensor<T> dstLocal = outDst.AllocTensor<T>();
        AscendC::ArithProgression<T>(dstLocal, static_cast<T>(firstValue_), static_cast<T>(diffValue_), count_);
        outDst.EnQue<T>(dstLocal);
    }
    __aicore__ inline void CopyOut()
    {
        AscendC::LocalTensor<T> dstLocal = outDst.DeQue<T>();
        const int32_t BLOCK_NUM = 32 / sizeof(T);
        AscendC::DataCopy(dst_global, dstLocal, (count_ + BLOCK_NUM - 1) / BLOCK_NUM * BLOCK_NUM);
        outDst.FreeTensor(dstLocal);
    }

private:
    AscendC::TPipe pipe;
    AscendC::TQue<AscendC::QuePosition::VECOUT, 1> outDst;
    AscendC::GlobalTensor<T> dst_global;
    int64_t firstValue_;
    int64_t diffValue_;
    uint32_t count_;
};
extern "C" __global__ __aicore__ void kernel_arith_progression_operator(GM_ADDR dstLocal)
{
    KernelArithProgression<half> op;
    op.Init(dstLocal, 1, 2, 15);
    op.Process();
}