ShiftLeft
Applicability
|
Product |
Supported/Unsupported |
|---|---|
|
|
√ |
|
|
√ |
|
|
√ |
|
|
x |
|
|
x |
|
|
x |
Function Usage
Shifts each element in the source operand leftwards by the number of bits specified by the scalar in the formula. The left shift operation is classified into the following types based on the data type of the source operand:
- If the data type is unsigned, logical left shift is performed. In this case, the binary number is shifted leftward by a specified number of bits. The most significant bit is discarded, and the least significant bit is padded with 0. For example, after the binary number 1010101010101010 (uint16_t) is logically shifted leftward by one bit, the result is 0101010101010100.
- If the data type is signed, arithmetic left shift is performed. In this case, the binary number is shifted leftward by a specified number of bits. The second most significant bit is discarded, and the least significant bit is padded with 0. For example, after the binary number 1010101010101010 (int16_t) is arithmetically shifted leftward by one bit, the result is 1101010101010100; after the binary number is arithmetically shifted leftward by three bits, the result is 1101010101010000.

Prototype
- Computation of the first n pieces of data of a tensor
1 2
template <typename T, bool isSetMask = true> __aicore__ inline void ShiftLeft(const LocalTensor<T>& dst, const LocalTensor<T>& src, const T& scalarValue, const int32_t& count)
- High-dimensional tensor sharding computation
- Bitwise mask mode
1 2
template <typename T, bool isSetMask = true> __aicore__ inline void ShiftLeft(const LocalTensor<T>& dst, const LocalTensor<T>& src, const T& scalarValue, uint64_t mask[], const uint8_t repeatTime, const UnaryRepeatParams& repeatParams)
- Contiguous mask mode
1 2
template <typename T, bool isSetMask = true> __aicore__ inline void ShiftLeft(const LocalTensor<T>& dst, const LocalTensor<T>& src, const T& scalarValue, uint64_t mask, const uint8_t repeatTime, const UnaryRepeatParams& repeatParams)
- Bitwise mask mode
- Computation of the first n pieces of data of a tensor
1 2
template <typename T, typename U, bool isSetMask = true, typename Std::enable_if<Std::is_same<PrimT<T>, U>::value, bool>::type = true> __aicore__ inline void ShiftLeft(const LocalTensor<T>& dst, const LocalTensor<T>& src, const U& scalarValue, const int32_t& count)
- High-dimensional tensor sharding computation
- Bitwise mask mode
1 2
template <typename T, typename U, bool isSetMask = true, typename Std::enable_if<Std::is_same<PrimT<T>, U>::value, bool>::type = true> __aicore__ inline void ShiftLeft(const LocalTensor<T>& dst, const LocalTensor<T>& src, const U& scalarValue, uint64_t mask[], const uint8_t repeatTime, const UnaryRepeatParams& repeatParams)
- Contiguous mask mode
1 2
template <typename T, typename U, bool isSetMask = true, typename Std::enable_if<Std::is_same<PrimT<T>, U>::value, bool>::type = true> __aicore__ inline void ShiftLeft(const LocalTensor<T>& dst, const LocalTensor<T>& src, const U& scalarValue, uint64_t mask, const uint8_t repeatTime, const UnaryRepeatParams& repeatParams)
- Bitwise mask mode
Parameters
|
Parameter |
Description |
|---|---|
|
T |
Operand data type. For the For the For the |
|
U |
Data type of scalarValue. For the For the For the |
|
isSetMask |
Whether to set the mask mode and mask value inside the API.
For the following models, the isSetMask parameter in the API for calculating the first n pieces of data in a tensor does not take effect. Retain the default value.
|
|
Parameter |
Input/Output |
Meaning |
|---|---|---|
|
dst |
Output |
Destination operand. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT. The start address of the LocalTensor must be 32-byte aligned. |
|
src |
Input |
Source operand. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT. The start address of the LocalTensor must be 32-byte aligned. Its data type must be the same as that of the destination operand. |
|
scalarValue |
Input |
Shift distance. Its data type must be the same as that of the tensor element in the destination operand. For the For the For the |
|
count |
Input |
Number of elements involved in the computation. |
|
mask/mask[] |
Input |
The mask parameter is used to control the elements involved in computation in each iteration.
|
|
repeatTime |
Input |
Number of iteration repeats. The Vector Unit reads 256 bytes of contiguous data for computation each time. To read the complete data for processing, the unit needs to read the input data in multiple repeats. repeatTime indicates the number of repeats. For details about this parameter, see High-dimensional Sharding APIs. |
|
repeatParams |
Input |
Control structure information of element operations. For details, see UnaryRepeatParams. |
Returns
None
Restrictions
- For details about the operand address alignment requirements, see General Address Alignment Restrictions.
- For details about the operand address overlapping restrictions, see General Address Overlap Restrictions.
Examples
- Example of high-dimensional tensor sharding computation (contiguous mask mode)
1 2 3 4 5 6
uint64_t mask = 128; int16_t scalar = 2; // repeatTime = 4. 128 elements are processed in each iteration. To calculate 512 elements, four iterations are required. // dstBlkStride, srcBlkStride = 1. The interval between src0 data addresses involved in calculation in each iteration is one data block, indicating that data is continuously read and written in a single iteration. // dstRepStride, srcRepStride = 8. The interval between addresses of adjacent iterations is eight data blocks, indicating that data is continuously read and written between adjacent iterations. AscendC::ShiftLeft(dstLocal, srcLocal, scalar, mask, 4, { 1, 1, 8, 8 });
- Example of high-dimensional tensor sharding computation (bitwise mask mode)
1 2 3 4 5 6
uint64_t mask[2] = { UINT64_MAX, UINT64_MAX }; int16_t scalar = 2; // repeatTime = 4. 128 elements are processed in each iteration. To calculate 512 elements, four iterations are required. // dstBlkStride, srcBlkStride = 1. The interval between src0 data addresses involved in calculation in each iteration is one data block, indicating that data is continuously read and written in a single iteration. // dstRepStride, srcRepStride = 8. The interval between addresses of adjacent iterations is eight data blocks, indicating that data is continuously read and written between adjacent iterations. AscendC::ShiftLeft(dstLocal, srcLocal, scalar, mask, 4, {1, 1, 8, 8});
- Example of computing the first n pieces of data of a tensor
1 2
int16_t scalar = 2; AscendC::ShiftLeft(dstLocal, srcLocal, scalar, 512);
Input (src0Local): [1 2 3 ... 512] Input (scalar): 2 Output (dstLocal): [4 8 12 ... 2048]