Adds
Applicability
|
Product |
Supported/Unsupported |
|---|---|
|
|
√ |
|
|
√ |
|
|
√ |
|
|
√ |
|
|
x |
|
|
√ |
Function Usage
Adds each element in the vector to the scalar. The calculation formula is as follows:

Prototype
- Computation of the first n pieces of data of a tensor
1 2
template <typename T, bool isSetMask = true> __aicore__ inline void Adds(const LocalTensor<T>& dst, const LocalTensor<T>& src, const T& scalarValue, const int32_t& count)
- High-dimensional tensor sharding computation
- Bitwise mask mode
1 2
template <typename T, bool isSetMask = true> __aicore__ inline void Adds(const LocalTensor<T>& dst, const LocalTensor<T>& src, const T& scalarValue, uint64_t mask[], const uint8_t repeatTime, const UnaryRepeatParams& repeatParams)
- Contiguous mask mode
1 2
template <typename T, bool isSetMask = true> __aicore__ inline void Adds(const LocalTensor<T>& dst, const LocalTensor<T>& src, const T& scalarValue, uint64_t mask, const uint8_t repeatTime, const UnaryRepeatParams& repeatParams)
- Bitwise mask mode
- Computation of the first n pieces of data of a tensor
1 2
template <typename T, typename U, bool isSetMask = true, typename Std::enable_if<Std::is_same<PrimT<T>, U>::value, bool>::type = true> __aicore__ inline void Adds(const LocalTensor<T>& dst, const LocalTensor<T>& src, const U& scalarValue, const int32_t& count)
- High-dimensional tensor sharding computation
- Bitwise mask mode
1 2
template <typename T, typename U, bool isSetMask = true, typename Std::enable_if<Std::is_same<PrimT<T>, U>::value, bool>::type = true> __aicore__ inline void Adds(const LocalTensor<T>& dst, const LocalTensor<T>& src, const U& scalarValue, uint64_t mask[], const uint8_t repeatTime, const UnaryRepeatParams& repeatParams)
- Contiguous mask mode
1 2
template <typename T, typename U, bool isSetMask = true, typename Std::enable_if<Std::is_same<PrimT<T>, U>::value, bool>::type = true> __aicore__ inline void Adds(const LocalTensor<T>& dst, const LocalTensor<T>& src, const U& scalarValue, uint64_t mask, const uint8_t repeatTime, const UnaryRepeatParams& repeatParams)
- Bitwise mask mode
Parameters
|
Parameter |
Description |
|---|---|
|
T |
Operand data type. For the |
|
U |
Data type of scalarValue. For the |
|
isSetMask |
Whether to set the mask mode and mask value inside the API.
For the following models, the isSetMask parameter in the API for calculating the first n pieces of data in a tensor does not take effect. Retain the default value.
|
|
Parameter |
Input/Output |
Meaning |
|---|---|---|
|
dst |
Output |
Destination operand. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT. The start address of the LocalTensor must be 32-byte aligned. |
|
src |
Input |
Source operand. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT. The start address of the LocalTensor must be 32-byte aligned. Its data type must be the same as that of the destination operand. |
|
scalarValue |
Input |
Source operand. Its data type must be the same as the element type of the destination operand. |
|
count |
Input |
Number of elements involved in the computation. |
|
mask/mask[] |
Input |
The mask parameter is used to control the elements involved in computation in each iteration.
|
|
repeatTime |
Input |
Number of iteration repeats. The Vector Unit reads 256 bytes of contiguous data for computation each time. To read the complete data for processing, the unit needs to read the input data in multiple repeats. repeatTime indicates the number of repeats. For details about this parameter, see High-dimensional Sharding APIs. |
|
repeatParams |
Input |
Control structure information of element operations. For details, see UnaryRepeatParams. |
Returns
None
Restrictions
- For details about the operand address alignment requirements, see General Address Alignment Restrictions.
- For details about the operand address overlapping restrictions, see General Address Overlap Restrictions.
Examples
For more examples, see here.
- Example of high-dimensional tensor sharding computation (contiguous mask mode)
1 2 3 4 5 6
uint64_t mask = 128; int16_t scalar = 2; // repeatTime = 4. 128 elements are processed in a single iteration. To compute 512 elements, four iterations are required. // dstBlkStride, srcBlkStride = 1. The interval between src0 data addresses involved in calculation in each iteration is one data block, indicating that data is continuously read and written in a single iteration. // dstRepStride, srcRepStride = 8. The interval between addresses of adjacent iterations is eight data blocks, indicating that data is continuously read and written between adjacent iterations. AscendC::Adds(dstLocal, srcLocal, scalar, mask, 4, { 1, 1, 8, 8 });
- Example of high-dimensional tensor sharding computation (bitwise mask mode)
1 2 3 4 5 6
uint64_t mask[2] = { UINT64_MAX, UINT64_MAX }; int16_t scalar = 2; // repeatTime = 4. 128 elements are processed in each iteration. To compute 512 elements, four iterations are required. // dstBlkStride, srcBlkStride = 1. The interval between src0 data addresses involved in calculation in each iteration is one data block, indicating that data is continuously read and written in a single iteration. // dstRepStride, srcRepStride = 8. The interval between addresses of adjacent iterations is eight data blocks, indicating that data is continuously read and written between adjacent iterations. AscendC::Adds(dstLocal, srcLocal, scalar, mask, 4, {1, 1, 8, 8});
- Example of computing the first n pieces of data of a tensor
1 2
int16_t scalar = 2; AscendC::Adds(dstLocal, srcLocal, scalar, 512);
Input (src0Local): [1 2 3 ... 512] Input (scalar): 2 Output (dstLocal): [3 4 5 ... 514]