WriteSpmBuffer

Applicability

Product

Supported/Unsupported

Atlas A3 training products/Atlas A3 inference products

Atlas A2 training products/Atlas A2 inference products

Atlas 200I/500 A2 inference products

x

Atlas inference product's AI Core

Atlas inference product's Vector Core

x

Atlas training products

Function Usage

Copies the data to be overflowed and temporarily stored to the SPM buffer.

Prototype

  • Applicable to continuous and discontinuous temporary data storage:
    1
    2
    template <typename T>
    __aicore__ inline void WriteSpmBuffer(const LocalTensor<T>& writeBuffer, const DataCopyParams& copyParams, int32_t writeOffset = 0)
    
  • Applicable to continuous temporary data storage:
    1
    2
    template <typename T>
    __aicore__ inline void WriteSpmBuffer(const LocalTensor<T>& writeBuffer, const int32_t writeSize, int32_t writeOffset = 0)
    

Parameters

Table 1 Parameters

Parameter

Input/Output

Meaning

writeBuffer

Input

Local memory to be overflowed and temporarily stored.

copyParams

Input

Movement parameter, DataCopyParams type. For details about the structure definition of DataCopyParams, see Table 2.

writeSize

Input

Number of copied elements.

writeOffset

Input

Offset copied to the SPM buffer, in bytes.

Table 2 Parameters in the DataCopyParams structure

Field

Meaning

blockCount

Number of consecutive data blocks to be transferred. The value is of the uint16_t type. The value range is as follows: blockCount ∈ [1, 4095].

blockLen

Length of each consecutive data block to be transferred, in the unit of DataBlock (32 bytes). The value is of the uint16_t type. The value range is as follows: blockLen ∈ [1, 65535].

Specifically, when dst is located in C2PIPE2GM, the unit is 128 bytes. When dst is located in C2, the unit is 64 bytes, indicating the length of the consecutive data block to be transferred of the source operand.

srcGap

Interval between adjacent consecutive data blocks of the source operand (the interval between the end of the previous data block and the beginning of the next data block), in the unit of DataBlock (32 bytes). The value is of the uint16_t type. The value of srcGap must be within the value range of this data type.

In the L1 Buffer -> Fixpipe Buffer scenario, srcGap refers to the interval between adjacent consecutive data blocks of the source operand (the interval between the beginning of the previous data block and the beginning of the next data block), in the unit of DataBlock (32 bytes). The value is of the uint16_t type. The value of srcGap must be within the value range of this data type.

dstGap

Interval between adjacent consecutive data blocks of the destination operand (the interval between the end of the previous data block and the beginning of the next data block), in the unit of DataBlock (32 bytes). The value is of the uint16_t type. The value of dstGap must be within the value range of this data type.

Specifically, when dstLocal is located in C2PIPE2GM, the unit is 128 bytes. When dstLocal is located in C2, the unit is 64 bytes.

In the L1 Buffer -> Fixpipe Buffer scenario, dstGap refers to the interval between adjacent consecutive data blocks of the source operand (the interval between the beginning of the previous data block and the beginning of the next data block), in the unit of DataBlock (32 bytes). The value is of the uint16_t type. The value of dstGap must be within the value range of this data type.

Restrictions

  • Ensure that writeSize and writeOffset are 32-byte aligned when the data is temporarily stored and copied to L1.
  • The size of the copied memory cannot exceed the size of the initialized SPM buffer. Otherwise, problems such as overflow violation may occur.

Returns

None

Example

1
2
3
4
5
6
7
8
AscendC::TPipe pipe;
AscendC::TQue<AscendC::TPosition::VECIN, 1> inQueueSrcVecIn;
int dataSize = 32; // Assume that T is of the half type. Allocate a buffer from the UB (32 x sizeof(half) bytes).
int offset = 32; // 32 bytes offset when copied to spmBuffer
pipe.InitBuffer(inQueueSrcVecIn, 1, dataSize * sizeof(half));
AscendC::LocalTensor<half> writeLocal = inQueueSrcVecIn.AllocTensor<half>();
AscendC::DataCopyParams copyParams{1, 2, 0, 0}; // Copy a contiguous data block from the UB. The length of a data block is two data blocks, and the length of a data block is 32 bytes.
pipe.WriteSpmBuffer(writeLocal, copyParams, offset);
1
2
3
4
5
6
7
AscendC::TPipe pipe;
AscendC::TQue<AscendC::TPosition::VECIN, 1> inQueueSrcVecIn;
int dataSize = 32; // Assume that T is of the half type. Allocate a buffer from the UB (32 x sizeof(half) bytes).
int offset = 32; // 32 bytes offset when copied to spmBuffer
pipe.InitBuffer(inQueueSrcVecIn, 1, dataSize * sizeof(half));
AscendC::LocalTensor<half> writeLocal = inQueueSrcVecIn.AllocTensor<half>();
pipe.WriteSpmBuffer(writeLocal, dataSize, offset);