Muls
Applicability
|
Product |
Supported/Unsupported |
|---|---|
|
|
√ |
|
|
√ |
|
|
√ |
|
|
√ |
|
|
x |
|
|
√ |
Function Usage
Multiplies each element in the vector by a scalar. The calculation formula is as follows:

Prototype
- Compute of the first n data elements of a tensor
1 2
template <typename T, bool isSetMask = true> __aicore__ inline void Muls(const LocalTensor<T>& dst, const LocalTensor<T>& src, const T& scalarValue, const int32_t& count)
- Compute of the sharded high-dimensional tensor
- Bitwise mask mode
1 2
template <typename T, bool isSetMask = true> __aicore__ inline void Muls(const LocalTensor<T>& dst, const LocalTensor<T>& src, const T& scalarValue, uint64_t mask[], const uint8_t repeatTime, const UnaryRepeatParams& repeatParams)
- Contiguous mask mode
1 2
template <typename T, bool isSetMask = true> __aicore__ inline void Muls(const LocalTensor<T>& dst, const LocalTensor<T>& src, const T& scalarValue, uint64_t mask, const uint8_t repeatTime, const UnaryRepeatParams& repeatParams)
- Bitwise mask mode
- Compute of the first n data elements of a tensor
1 2
template <typename T, typename U, bool isSetMask = true, typename Std::enable_if<Std::is_same<PrimT<T>, U>::value, bool>::type = true> __aicore__ inline void Muls(const LocalTensor<T>& dst, const LocalTensor<T>& src, const U& scalarValue, const int32_t& count)
- Compute of the sharded high-dimensional tensor
- Bitwise mask mode
1 2
template <typename T, typename U, bool isSetMask = true, typename Std::enable_if<Std::is_same<PrimT<T>, U>::value, bool>::type = true> __aicore__ inline void Muls(const LocalTensor<T>& dst, const LocalTensor<T>& src, const U& scalarValue, uint64_t mask[], const uint8_t repeatTime, const UnaryRepeatParams& repeatParams)
- Contiguous mask mode
1 2
template <typename T, typename U, bool isSetMask = true, typename Std::enable_if<Std::is_same<PrimT<T>, U>::value, bool>::type = true> __aicore__ inline void Muls(const LocalTensor<T>& dst, const LocalTensor<T>& src, const U& scalarValue, uint64_t mask, const uint8_t repeatTime, const UnaryRepeatParams& repeatParams)
- Bitwise mask mode
Parameters
|
Parameter |
Description |
|---|---|
|
T |
Operand data type. For the For the For the For the For the |
|
U |
Data type of scalarValue. For the For the For the For the For the |
|
isSetMask |
Whether to set the mask mode and mask value inside the API.
For the following models, the isSetMask parameter in the API for calculating the first n pieces of data in a tensor does not take effect. Retain the default value.
|
|
Parameter |
Input/Output |
Meaning |
|---|---|---|
|
dst |
Output |
Destination operand. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT. The start address of the LocalTensor must be 32-byte aligned. |
|
src |
Input |
Source operand. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT. The start address of the LocalTensor must be 32-byte aligned. Its data type must be the same as that of the destination operand. |
|
scalarValue |
Input |
Source operand, and its data type must be the same as the element type of the tensor in the destination operand. |
|
count |
Input |
Number of elements involved in the computation. |
|
mask/mask[] |
Input |
The mask parameter is used to control the elements involved in computation in each iteration.
|
|
repeatTime |
Input |
Number of iteration repeats. The Vector Unit reads 256 bytes of contiguous data for computation each time. To read the complete data for processing, the unit needs to read the input data in multiple repeats. repeatTime indicates the number of repeats. For details about this parameter, see High-dimensional Sharding APIs. |
|
repeatParams |
Input |
Control structure information of element operations. For details, see UnaryRepeatParams. |
Returns
None
Restrictions
- For details about the operand address alignment requirements, see General Address Alignment Restrictions.
- For details about the operand address overlapping restrictions, see General Address Overlap Restrictions.
Examples
For more examples, see here.
- High-dimensional tensor segmentation and computation example (mask in contiguous mode)
1 2 3 4 5 6
uint64_t mask = 128; int16_t scalar = 2; // repeatTime = 4. 128 elements are processed in a single iteration. To compute 512 elements, four iterations are required. // dstBlkStride, srcBlkStride = 1. The interval between src0 data addresses involved in calculation in each iteration is one data block, indicating that data is continuously read and written in a single iteration. // dstRepStride, srcRepStride = 8. The interval between addresses of adjacent iterations is eight data blocks, indicating that data is continuously read and written between adjacent iterations. AscendC::Muls(dstLocal, srcLocal, scalar, mask, 4, { 1, 1, 8, 8 });
- High-dimensional tensor segmentation and computation example (mask in bitwise mode)
1 2 3 4 5 6
uint64_t mask[2] = { UINT64_MAX, UINT64_MAX }; int16_t scalar = 2; // repeatTime = 4. 128 elements are processed in a single iteration. To compute 512 elements, four iterations are required. // dstBlkStride, srcBlkStride = 1. The interval between src0 data addresses involved in calculation in each iteration is one data block, indicating that data is continuously read and written in a single iteration. // dstRepStride, srcRepStride = 8. The interval between addresses of adjacent iterations is eight data blocks, indicating that data is continuously read and written between adjacent iterations. AscendC::Muls(dstLocal, srcLocal, scalar, mask, 4, {1, 1, 8, 8});
- Example of computing the first n pieces of data in a tensor
1 2
int16_t scalar = 2; AscendC::Muls(dstLocal, srcLocal, scalar, 512);
Input (srcLocal): [1 2 3 ... 512] Input (scalar): 2 Output (dstLocal): [2 4 6 ... 1024]