VectorPadding (ISASI)
Applicability
|
Product |
Supported (√/x) |
|---|---|
|
|
x |
|
|
x |
|
|
x |
|
|
√ |
|
|
x |
|
|
x |
Functions
Performs the padding operation on the source operand by the data block based on padMode and padSide.
Suppose that a data block of the source operand has 16 numbers: data block[0:15] = a to p.
- padSide==false: pads from the left of the data block, that is, the initial value of the data block (a->p)
- padSide==true: pads from the right of the data block, that is, the end value of the data block (p->a)
- padMode==0: uses the adjacent number as the padding value, for example, aaa|abc (padSide=false) and nop|ppp (padSide=true).
- padMode==1: uses the adjacent data block value for symmetric padding, for example, cba|abc (padSide=false) and nop|pon (padSide=true).
- padMode==2: uses the adjacent data block value that is offset by a number for symmetric padding. For example:
- In padSide=false: xcb|abc, where xcb is padded as follows: If a is discarded, x is padded with 0 symmetrically.
- In padSide=true: nop|onx, where onx is padded as follows: If p is discarded, x is padded with 0 symmetrically.
Prototype
- Computation of the first n data elements of a tensor
1 2
template <typename T> __aicore__ inline void VectorPadding(const LocalTensor<T>& dst, const LocalTensor<T>& src, const uint8_t padMode, const bool padSide, const uint32_t count)
- High-dimensional tensor sharding computation
- Bitwise mask mode
1 2
template <typename T, bool isSetMask = true> __aicore__ inline void VectorPadding(const LocalTensor<T>& dst, const LocalTensor<T>& src, const uint8_t padMode, const bool padSide, const uint64_t mask[], const uint8_t repeatTime, const UnaryRepeatParams& repeatParams)
- Contiguous mask mode
1 2
template <typename T, bool isSetMask = true> __aicore__ inline void VectorPadding(const LocalTensor<T>& dst, const LocalTensor<T>& src, const uint8_t padMode, const bool padSide, const uint64_t mask, const uint8_t repeatTime, const UnaryRepeatParams& repeatParams)
- Bitwise mask mode
Parameters
|
Parameter |
Description |
|---|---|
|
T |
Operand data type. For the |
|
isSetMask |
Indicates whether to set mask inside the API.
|
|
Parameter |
Input/Output |
Description |
|---|---|---|
|
dst |
Output |
Destination operand. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT. The start address of the LocalTensor must be 32-byte aligned. |
|
src |
Input |
Source operand. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT. The start address of the LocalTensor must be 32-byte aligned. The source operand must have the same data type as the destination operand. |
|
padMode |
Input |
Padding mode. The type is uint8_t. The value range is [0, 2].
|
|
padSide |
Input |
Padding direction. The value is of the bool type.
|
|
count |
Input |
Number of elements involved in the computation. |
|
mask[]/mask |
Input |
The mask parameter is used to control the elements involved in computation in each iteration.
|
|
repeatTime |
Input |
Number of iteration repeats. The Vector Unit reads 256 bytes of contiguous data for computation each time. To read the complete data for processing, the unit needs to read the input data in multiple repeats. repeatTime indicates the number of repeats. For details about this parameter, see High-dimensional Sharding APIs. |
|
repeatParams |
Input |
Parameters that control the operand address strides. They are of the UnaryRepeatParams type, and contain such parameters as those that specify the address stride of the operand for the same data block between adjacent iterations and address stride of the operand between different data blocks in a single iteration. For details about the address stride parameters between adjacent iterations, see repeatStride. For details about the address stride parameters of DataBlock in the same iteration, see dataBlockStride. |
Returns
None
Constraints
- For details about the operand address alignment requirements, see General Address Alignment Restrictions.
- mask controls only the write operation on the destination operand. It is irrelevant to the read operation on the source operand.
- count indicates the total number of elements written into the destination operand. The reading of the source operand is irrelevant to count.
Examples
In this example, srcLocal and dstLocal are of the half type.
For more examples, see here.
- Example of high-dimensional tensor sharding computation (contiguous mask mode)
1 2 3 4 5 6 7
uint64_t mask = 256 / sizeof(half); uint8_t padMode = 0; bool padSide = false; // repeatTime = 4, 128 elements one repeat, 512 elements total // dstBlkStride, srcBlkStride = 1, no gap between blocks in one repeat // dstRepStride, srcRepStride = 8, no gap between repeats AscendC::VectorPadding(dstLocal, srcLocal, padMode, padSide, mask, 4, { 1, 1, 8, 8 });
- Example of high-dimensional tensor sharding computation (bitwise mask mode)
1 2 3 4 5 6 7
uint64_t mask[2] = { UINT64_MAX, UINT64_MAX }; uint8_t padMode = 0; bool padSide = false; // repeatTime = 4, 128 elements one repeat, 512 elements total // dstBlkStride, srcBlkStride = 1, no gap between blocks in one repeat // dstRepStride, srcRepStride = 8, no gap between repeats AscendC::VectorPadding(dstLocal, srcLocal, padMode, padSide, mask, 4, { 1, 1, 8, 8 });
- Example of computing the first n data elements of a tensor
1 2 3
uint8_t padMode = 0; bool padSide = false; AscendC::VectorPadding(dstLocal, srcLocal, padMode, padSide, 512);
// In srcLocal, there are 16 numbers in a data block. Input (srcLocal): [6.938 -8.86 -0.2263 ... 1.971 1.778] Output (dstLocal): [6.938 6.938 6.938 ... 6.938 6.938]