ArithProgression
Function Usage
Returns an arithmetic progression given the start value, arithmetical value, and length.
Principles
The figure below illustrates the internal algorithm block diagram of ArithProgression high-level APIs, taking the float type, ND format, and firstValue and diffValue input scalars as examples.

The computation process is divided into the following steps, all of which are performed on vectors:
- Step for arithmetic progression with a length less than 8: Use SetValue to expand the arithmetic progression based on the values of firstValue and diffValue. The maximum expanded length is 8. If the length of the arithmetic progression is less than 8, the algorithm ends.
- Step for arithmetic progression with a length of 8 to 64: Use Adds to expand the arithmetic progression result in step 1, with a maximum of seven cycles of expansion to reach a length of 64. If the length of the arithmetic progression is less than 64, the algorithm ends.
- Step for arithmetic progression with a length greater than 64: Use Adds to expand the arithmetic progression result in step 2, and repeat until the target length of the arithmetic progression is reached.
Prototype
1 2 | template <typename T> __aicore__ inline void ArithProgression(const LocalTensor<T> &dstLocal, const T firstValue, const T diffValue, const int32_t count) |
Parameters
Parameter |
Function |
|---|---|
T |
Data type of the operand. |
Parameter |
Input/Output |
Description |
|---|---|---|
dstLocal |
Output |
Destination operand. The size of dstTensor must be greater than or equal to the value of count x sizeof(T). The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT. |
firstValue |
Input |
Value of the first element in an arithmetic progression. |
diffValue |
Input |
Difference between elements in an arithmetic progression. The value must be greater than or equal to 0. |
count |
Input |
Length of an arithmetic progression. The value of count is greater than 0. |
Returns
None
Availability
Precautions
Currently, only the ND format is supported.
Example
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 | #include "kernel_operator.h" template <typename T> class KernelArithProgression { public: __aicore__ inline KernelArithProgression() {} __aicore__ inline void Init(GM_ADDR dstGm, int64_t firstValue, int64_t diffValue, uint32_t count) { firstValue_ = firstValue; diffValue_ = diffValue; count_ = count; dst_global.SetGlobalBuffer(reinterpret_cast<__gm__ T *>(dstGm), count_); pipe.InitBuffer(outDst, 1, (sizeof(T) * count_ + 32 - 1) / 32 * 32); } __aicore__ inline void Process() { CopyIn(); Compute(); CopyOut(); } private: __aicore__ inline void CopyIn() { ; } __aicore__ inline void Compute() { AscendC::LocalTensor<T> dstLocal = outDst.AllocTensor<T>(); AscendC::ArithProgression<T>(dstLocal, static_cast<T>(firstValue_), static_cast<T>(diffValue_), count_); outDst.EnQue<T>(dstLocal); } __aicore__ inline void CopyOut() { AscendC::LocalTensor<T> dstLocal = outDst.DeQue<T>(); const int32_t BLOCK_NUM = 32 / sizeof(T); AscendC::DataCopy(dst_global, dstLocal, (count_ + BLOCK_NUM - 1) / BLOCK_NUM * BLOCK_NUM); outDst.FreeTensor(dstLocal); } private: AscendC::TPipe pipe; AscendC::TQue<AscendC::QuePosition::VECOUT, 1> outDst; AscendC::GlobalTensor<T> dst_global; int64_t firstValue_; int64_t diffValue_; uint32_t count_; }; extern "C" __global__ __aicore__ void kernel_arith_progression_operator(GM_ADDR dstLocal) { KernelArithProgression<half> op; op.Init(dstLocal, 1, 2, 15); op.Process(); } |