Slice Data Movement
Applicability
Product |
Supported/Unsupported |
|---|---|
√ |
|
√ |
|
x |
|
√ |
|
x |
|
x |
Functions
Supports slice-based data movement, extracting a subset of multi-dimensional tensor data for movement.
Prototype
- Global Memory -> Local Memory
1 2
template <typename T> __aicore__ inline void DataCopy(const LocalTensor<T>& dst, const GlobalTensor<T>& src, const SliceInfo dstSliceInfo[], const SliceInfo srcSliceInfo[], const uint32_t dimValue = 1)
- Local Memory -> Global Memory
1 2
template <typename T> __aicore__ inline void DataCopy(const GlobalTensor<T> &dst, const LocalTensor<T> &src, const SliceInfo dstSliceInfo[], const SliceInfo srcSliceInfo[], const uint32_t dimValue = 1)
For details about the supported data paths and data types of each prototype, see Supported Channels and Data Types.
Parameters
Parameter |
Description |
|---|---|
T |
Data type of the source operand and destination operand. For details about the supported data types, see Supported Channels and Data Types. |
Parameter |
Input/Output |
Meaning |
|---|---|---|
dst |
Output |
Destination operand of the LocalTensor or GlobalTensor type. |
src |
Input |
Source operand of the LocalTensor or GlobalTensor type. |
srcSliceInfo/dstSliceInfo |
Input |
Slice information of the destination or source operand, which is of the SliceInfo type. For details, see ${INSTALL_DIR}/include/ascendc/basic_api/interface/kernel_struct_data_copy.h. Replace ${INSTALL_DIR} with the actual CANN component directory. |
dimValue |
Input |
Operand dimension information. The default value is 1. |
Field |
Meaning |
|---|---|
startIndex |
Location of the start element of a slice. |
endIndex |
Location of the end element of a slice. |
stride |
Number of interval elements between slices. |
burstLen |
Length of each horizontal data slice. It takes effect only when the dimension is one. If the dimension is greater than one, it must be set to 1. The unit is data block (32 bytes). For example, if the list of srcSliceInfo is {{16, 70, 7, 3, 87}, {0, 2, 1, 1, 3}}, then {16, 70, 7, 3, 87} indicates the slice information in the first dimension, burstLen is set to 3 (indicating that the size of a slice data segment is three data blocks); {0, 2, 1, 1, 3} indicates the slice information in the second dimension, and burstLen can only be set to 1. |
shapeValue |
Original length of the current dimension. The unit is element. |
The preceding parameters are parsed by using specific examples. The following figure shows the details.
- dimValue is 2, indicating that the dimension is two for the operand.
- srcSliceInfo is {{16, 70, 7, 3, 87}, {0, 2, 1, 1, 3}}.
- {16, 70, 7, 3, 87} is configured for a single row, that is, from a one-dimensional perspective. Each element represents a number.
startIndex = 16 indicates that the valid data segment starts from the 16th number.
endIndex = 70 indicates that the valid data segment ends at the 70th number.
stride = 7 and the unit is element, indicating the number of elements between two adjacent slice data segments. In the figure, the stride is seven zeros.
burstLen = 3 and the unit is 32 bytes, indicating that the size of a slice data segment is three data blocks in this valid data segment.
shapeValue = 87 indicates the length of a single row (unit: element), that is, 8 × 10 + 7 = 87 elements.
- {0, 2, 1, 1, 3} is configured for multiple rows, that is, from a two-dimensional perspective. Each element represents one row.
startIndex = 0 indicates that the valid data segment starts from row 0.
endIndex = 2 indicates that the valid data segment ends at row 2.
stride = 1 indicates that the interval element between two adjacent slice data segments is one row.
burstLen must be set to 1 when dimValue is greater than 1.
shapeValue = 3 indicates that there are three rows in total.
- {16, 70, 7, 3, 87} is configured for a single row, that is, from a one-dimensional perspective. Each element represents a number.
- dstSliceInfo is {{0, 47, 0, 3, 48}, {0, 1, 0, 1, 2}}.
- {0, 47, 0, 3, 48} is configured for a single row, that is, from a one-dimensional perspective. Each element represents a number.
startIndex = 0 indicates that the valid data segment starts from the 0th number.
endIndex = 47 indicates that the valid data segment ends at the 47th number.
stride = 0 and the unit is element, indicating the number of elements between two adjacent slice data segments. That is, there is no distance between the two slice data segments.
burstLen = 3 and the unit is 32 bytes, indicating that the size of a slice data segment is three data blocks in this valid data segment.
shapeValue = 48 indicates the length of a single row (unit: element), that is, 8 × 6 = 48 elements.
- {0, 1, 0, 1, 2} is configured for multiple rows, that is, from a two-dimensional perspective. Each element represents one row.
startIndex = 0 indicates that the valid data segment starts from row 0.
endIndex = 1 indicates that the valid data segment ends at row 1.
stride = 0 indicates that there is no distance between two adjacent slice data segments.
burstLen must be set to 1 when dimValue is greater than 1.
shapeValue = 2 indicates that there are two rows in total.
- {0, 47, 0, 3, 48} is configured for a single row, that is, from a one-dimensional perspective. Each element represents a number.
Returns
None
Restrictions
- You need to compute the size of the horizontal burstLen during slice data movements using the following formula: Number of horizontal slice elements x sizeof(T)/32 bytes. The number of horizontal slice elements multiplied by sizeof(T) must be a multiple of 32 bytes.
- During slice data movements, the size of the SliceInfo structure array must be the same as that of dimValue and cannot exceed 8.
- During slice data movements, the size of the srcSliceInfo array must be the same as that of the dstSliceInfo array, and the burstLen in the two structures must be the same (srcSliceInfo[i].burstLen = dstSliceInfo[i].burstLen).
- Slice data movements have certain requirements on parameters. You are advised to move slice data on the NPU side after the simulation result on the CPU is correct by referring to the calling examples.
Supported Channels and Data Types
The following data channels are expressed by using the logical position TPosition and the corresponding physical channels are also specified. For details about the mapping between TPosition and physical memory, see Table 1.
Internal Model |
Datapath |
Data Types of the Source and Destination Operands (Same) |
|---|---|---|
GM -> VECIN (GM -> UB) |
int8_t, uint8_t, int16_t, uint16_t, int32_t, uint32_t, half, float |
|
GM -> VECIN (GM -> UB) |
int8_t, uint8_t, int16_t, uint16_t, int32_t, uint32_t, half, bfloat16_t, float |
|
GM -> VECIN (GM -> UB) |
int8_t, uint8_t, int16_t, uint16_t, int32_t, uint32_t, half, bfloat16_t, float |
Internal Model |
Datapath |
Data Types of the Source and Destination Operands (Same) |
|---|---|---|
VECOUT, CO2 -> GM (UB -> GM) |
int8_t, uint8_t, int16_t, uint16_t, int32_t, uint32_t, half, float |
|
VECOUT -> GM (UB -> GM) |
int8_t, uint8_t, int16_t, uint16_t, int32_t, uint32_t, half, bfloat16_t, float |
|
VECOUT -> GM (UB -> GM) |
int8_t, uint8_t, int16_t, uint16_t, int32_t, uint32_t, half, bfloat16_t, float |
Examples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 | #include "kernel_operator.h" // In this sample, the tensor data type is float. template <typename T> class KernelDataCopySliceGM2UB { public: __aicore__ inline KernelDataCopySliceGM2UB() {} __aicore__ inline void Init(__gm__ uint8_t* dstGm, __gm__ uint8_t* srcGm) { AscendC::SliceInfo srcSliceInfoIn[] = {{16, 70, 7, 3, 87}, {0, 2, 1, 1, 3}};// As shown by the example input data, startIndex is 16, endIndex is 70, burstLen is 3, stride is 7, and shapeValue is 87. AscendC::SliceInfo dstSliceInfoIn[] = {{0, 47, 0, 3, 48}, {0, 1, 0, 1, 2}};// UB space is insufficient. You are advised to set the stride to 0. uint32_t dimValueIn = 2; uint32_t dstDataSize = 96; uint32_t srcDataSize = 261; dimValue = dimValueIn; for (uint32_t i = 0; i < dimValueIn; i++) { srcSliceInfo[i].startIndex = srcSliceInfoIn[i].startIndex; srcSliceInfo[i].endIndex = srcSliceInfoIn[i].endIndex; srcSliceInfo[i].stride = srcSliceInfoIn[i].stride; srcSliceInfo[i].burstLen = srcSliceInfoIn[i].burstLen; srcSliceInfo[i].shapeValue = srcSliceInfoIn[i].shapeValue; dstSliceInfo[i].startIndex = dstSliceInfoIn[i].startIndex; dstSliceInfo[i].endIndex = dstSliceInfoIn[i].endIndex; dstSliceInfo[i].stride = dstSliceInfoIn[i].stride; dstSliceInfo[i].burstLen = dstSliceInfoIn[i].burstLen; dstSliceInfo[i].shapeValue = dstSliceInfoIn[i].shapeValue; } srcGlobal.SetGlobalBuffer((__gm__ T *)srcGm); dstGlobal.SetGlobalBuffer((__gm__ T *)dstGm); pipe.InitBuffer(inQueueSrcVecIn, 1, dstDataSize * sizeof(T)); } __aicore__ inline void Process() { CopyIn(); CopyOut(); } private: __aicore__ inline void CopyIn() { AscendC::LocalTensor<T> srcLocal = inQueueSrcVecIn.AllocTensor<T>(); AscendC::DataCopy(srcLocal, srcGlobal, dstSliceInfo, srcSliceInfo, dimValue); inQueueSrcVecIn.EnQue(srcLocal); } __aicore__ inline void CopyOut() { AscendC::LocalTensor<T> srcOutLocal = inQueueSrcVecIn.DeQue<T>(); AscendC::DataCopy(dstGlobal, srcOutLocal, dstSliceInfo, dstSliceInfo, dimValue); inQueueSrcVecIn.FreeTensor(srcOutLocal); } private: AscendC::TPipe pipe; AscendC::TQue<AscendC::TPosition::VECIN, 1> inQueueSrcVecIn; AscendC::GlobalTensor<T> srcGlobal; AscendC::GlobalTensor<T> dstGlobal; AscendC::SliceInfo dstSliceInfo[K_MAX_DIM]; AscendC::SliceInfo srcSliceInfo[K_MAX_DIM]; // K_MAX_DIM = 8 uint32_t dimValue; }; extern "C" __global__ __aicore__ void kernel_data_copy_slice_out2ub(__gm__ uint8_t* src_gm, __gm__ uint8_t* dst_gm) { KernelDataCopySliceGM2UB<TYPE> op; op.Init(dst_gm, src_gm); op.Process(); } |
For details about the result example, see Figure 1.
