WriteSpmBuffer

Function Usage

Copies the data to be overflowed and temporarily stored to the SPM buffer.

For details about the SPM buffer and its usage example, see SPM Buffer.

Prototype

  • Applicable to continuous and discontinuous temporary data storage:
    1
    2
    template <typename T>
    __aicore__ inline void WriteSpmBuffer(const LocalTensor<T>& writeLocal, const DataCopyParams& copyParams, int32_t writeOffset = 0)
    
  • Applicable to continuous temporary data storage:
    1
    2
    template <typename T>
    __aicore__ inline void WriteSpmBuffer(const LocalTensor<T>& writeLocal, const int32_t writeSize, int32_t writeOffset = 0)
    

Parameters

Table 1 API parameters

Parameter

Input/Output

Meaning

writeLocal

Input

Local buffer to be overflowed and temporarily stored.

copyParams

Input

Movement parameter, DataCopyParams type. For details about the structure definition of DataCopyParams, see Table 2.

writeSize

Input

Number of copied elements.

writeoffset

Input

Offset copied to the SPM buffer. The unit is byte.

Table 2 Parameters in the DataCopyParams structure

Parameter

Meaning

blockCount

Specifies the number of data chunks to be consecutively transmitted in the command. The value range is [1, 4095].

blockLen

The length of each data chunk to be consecutively transmitted. The unit is data block (32 bytes). The value range is [1, 65535].

Particularly, when dstLocal is located in C2PIPE2GM, the unit is 128 bytes; when dstLocal is located in C2, the unit is 64 bytes.

srcStride

Interval between adjacent consecutive data chunks of the source operand (the interval between the tail of the previous data chunk and the header of the subsequent data chunk). The unit is data block (32 bytes). The data type is uint16_t. The value of srcStride cannot exceed the value range of this data type.

dstStride

Interval between adjacent consecutive data chunks of the destination operand (the interval between the tail of the previous data chunk and the header of the subsequent data chunk). The unit is data block (32 bytes). The data type is uint16_t. The value of dstStride cannot exceed the value range of this data type.

Particularly, when dstLocal is located in C2PIPE2GM, the unit is 128 bytes; when dstLocal is located in C2, the unit is 64 bytes.

Availability

Atlas Training Series Product

Precautions

  • Ensure that writeSize and writeOffset are 32-byte aligned when the data is temporarily stored and copied to L1.
  • The size of the copied buffer cannot exceed the size of the initialized SPM buffer. Otherwise, problems such as overflow violation may occur.

Returns

None

Example

1
2
3
4
5
6
7
8
AscendC::TPipe pipe;
AscendC::TQue<AscendC::QuePosition::VECIN, 1> inQueueSrcVecIn;
int dataSize = 32; // Assume that T is of the half type. Allocate a memory block from the UB (32 x sizeof(half) bytes).
int offset = 32; // 32 bytes offset when copied to spmBuffer
pipe.InitBuffer(inQueueSrcVecIn, 1, dataSize * sizeof(half));
AscendC::LocalTensor<half> writeLocal = inQueueSrcVecIn.AllocTensor<half>();
AscendC::DataCopyParams copyParams{1, 2, 0, 0}; // Move a continuous data block from the UB. The length of a data block is two data blocks, and the length of a data block is 32 bytes.
pipe.WriteSpmBuffer(writeLocal, copyParams, offset);
1
2
3
4
5
6
7
AscendC::TPipe pipe;
AscendC::TQue<AscendC::QuePosition::VECIN, 1> inQueueSrcVecIn;
int dataSize = 32; // Assume that T is of the half type. Allocate a memory block from the UB (32 x sizeof(half) bytes).
int offset = 32; // 32 bytes offset when copied to spmBuffer
pipe.InitBuffer(inQueueSrcVecIn, 1, dataSize * sizeof(half));
AscendC::LocalTensor<half> writeLocal = inQueueSrcVecIn.AllocTensor<half>();
pipe.WriteSpmBuffer(writeLocal, dataSize, offset);