LocalTensor

Function Usage

Stores the data of the Local Memory in the AI Core. The QuePosition can be VECIN, VECOUT, VECCALC, A1, A2, B1, B2, CO1, or CO2.

Prototype

template <typename T> class LocalTensor : public BaseLocalTensor<T> {
public:
    // PrimT is used to extract the basic data type LiteType from TensorTrait when T is of the TensorTrait type.
    using PrimType = PrimT<T>;
    __aicore__ inline LocalTensor<T>() {};
#if defined(ASCENDC_CPU_DEBUG) && ASCENDC_CPU_DEBUG == 1
    ~LocalTensor();
    explicit LocalTensor<T>(TBuffAddr& address);
    LocalTensor<T>(const LocalTensor<T>& other);
    LocalTensor<T> operator = (const LocalTensor<T>& other);
    PrimType* GetPhyAddr(const uint32_t offset) const;
    PrimType* GetPhyAddr() const;
    __inout_pipe__(S) PrimType GetValue(const uint32_t offset) const;
    __inout_pipe__(S) PrimType& operator()(const uint32_t offset) const;
    template <typename CAST_T> __aicore__ inline LocalTensor<CAST_T> ReinterpretCast() const;
    template <typename T1> __inout_pipe__(S) void SetValue(const uint32_t index, const T1 value) const;
    LocalTensor operator[](const uint32_t offset) const;
    template <typename T1> void SetAddrWithOffset(LocalTensor<T1> &src, uint32_t offset);
    inline void Print();
    inline void Print(uint32_t len);
    int32_t ToFile(const std::string& fileName) const;
#else
    __aicore__ inline uint64_t GetPhyAddr() const;
    __aicore__ inline uint64_t GetPhyAddr(const uint32_t offset) const;
    __aicore__ inline __inout_pipe__(S) PrimType GetValue(const uint32_t index) const;
    __aicore__ inline __inout_pipe__(S) __ubuf__ PrimType& operator()(const uint32_t offset) const;
    template <typename CAST_T> __aicore__ inline LocalTensor<CAST_T> ReinterpretCast() const;
    template <typename T1> __aicore__ inline __inout_pipe__(S)
        void SetValue(const uint32_t index, const T1 value) const;
    __aicore__ inline LocalTensor operator[](const uint32_t offset) const;
    template <typename T1>
    [[deprecated("NOTICE: SetAddrWithOffset has been deprecated and will be removed in the next version. "
        "Please do not use it!")]]
    __aicore__ inline void SetAddrWithOffset(LocalTensor<T1> &src, uint32_t offset);
#endif
    __aicore__ inline int32_t GetPosition() const;
    __aicore__ inline void SetSize(const uint32_t size);
    __aicore__ inline uint32_t GetSize() const;
    [[deprecated("NOTICE: GetLength has been deprecated and will be removed in the next version. Please do not use "
                 "it!")]]
    __aicore__ inline uint32_t GetLength() const;
    [[deprecated("NOTICE: SetBufferLen has been deprecated and will be removed in the next version. Please do not use "
                 "it!")]]
    __aicore__ inline void SetBufferLen(uint32_t dataLen);
    __aicore__ inline void SetUserTag(const TTagType tag);
    __aicore__ inline TTagType GetUserTag() const;
    ...
    __aicore__ inline void SetShapeInfo(const ShapeInfo& shapeInfo);
    __aicore__ inline ShapeInfo GetShapeInfo() const;
    ...
};

Function Description

Type T supports the basic data types and the TensorTrait type, but must comply with the data types supported by the instructions that use the LocalTensor.

**Table 1** Function description
Function Name	Input Parameter	Description
GetValue	offset: offset value, in elements	Obtains a value in the LocalTensor. The number of the PrimType type is returned. This API is supported only when the TPosition of LocalTensor is VECIN, VECCALC, or VECOUT.
SetValue	offset: offset value, in elements value: configured value. The unit can be any type.	Sets a value in the LocalTensor. This API is supported only when the TPosition of LocalTensor is VECIN, VECCALC, or VECOUT.
operator[]	offset: offset value	Obtains the new LocalTensor whose offset value from the start address of the original LocalTensor is offset. Note that offset cannot exceed the size of the original LocalTensor.
operator()	offset: subscript index	Obtains the reference of the offsetth variable of the LocalTensor. As the left value, it is equivalent to the SetValue API. As the right value, it is equivalent to the GetValue API.
GetSize	None	Obtains the current LocalTensor size. The unit is element.
SetSize	size: number of elements. The unit is element.	Sets the current LocalTensor size. The unit is element. When LocalTensor is reused and its length changes, you need to call this API to reset the size.
SetUserTag	tag: tag information. The type TTagType corresponds to int32_t.	Adds user-defined information to a tensor. You can set the corresponding tag as required. You can call GetUserTag to obtain the tag information of a specified tensor and perform operations on the tensor based on the tag information.
GetUserTag	-	Obtains the tag information of a specified tensor block. You can perform different operations on tensors based on the tag information.
ReinterpretCast	-	Reinterprets the current tensor to a new type specified by the user. The address and content of the converted tensor are the same as those of the original tensor, and the tensor size (number of bytes) remains unchanged.
GetPhyAddr	-	Returns the address of the LocalTensor. If *offset* is passed in, *offset* elements are offset.
GetPosition	-	Obtains the abstract logical location of the QuePosition. The QuePosition can be VECIN, VECOUT, VECCALC, A1, A2, B1, B2, CO1, or CO2.
GetLength	-	Obtains the data length (in byte) of the LocalTensor.
SetShapeInfo	shapeInfo: ShapeInfo structure	Sets shapeInfo of the LocalTensor.
GetShapeInfo	-	Obtains shapeInfo of the LocalTensor. Note: There is no default value for shapeInfo. This API can be called to obtain the correct shapeInfo only after the shape information is set by calling SetShapeInfo.
SetAddrWithOffset	src: tensor of the basic address, which is used to set the offset tensor address. offset: offset length	Sets a tensor address with an offset. It is used to quickly obtain and define a tensor and specify the offset of the new tensor relative to the start address of the old tensor. The offset length is the number of elements of the old tensor.
SetBufferLen	dataLen: buffer length	Sets the buffer length, in bytes.
ToFile	fileName: file name	Only for CPU debugging. It dumps LocalTensor data to a file that is stored in the execution directory for precision debugging.
Print	dataLen: number of printed elements	Used only for CPU debugging. It prints LocalTensor data in the debugging window for precision debugging. Each line contains one data block (32 bytes).

Precautions

Do not use SetValue to assign values to LocalTensor frequently. Otherwise, the performance deteriorates. If a large number of values need to be assigned, select basic APIs for data padding or advanced APIs for data padding based on the actual situation. If incremental sequences need to be generated, select ArithProgression.

Examples

// srcLen = 256, num = 100, M=50
// Example 1
for (int32_t i = 0; i < srcLen; ++i) {
    inputLocal.SetValue(i, num); // Assign num to the ith position in inputLocal.
}
// The result of example 1 is as follows:
// Data (inputLocal): [100 100 100... 100]

// Example 2
for (int32_t i = 0; i < srcLen; ++i) {
    auto element = inputLocal.GetValue(i); // Obtain the value of the ith position in inputLocal.
}
// The result of example 2 is as follows:
// Element is 100.

// Example 3
for (int32_t i = 0; i < srcLen; ++i) {
    inputLocal(i) = num; // Assign num to the ith position in inputLocal.
}
// The result of example 3 is as follows:
// Data (inputLocal): [100 100 100... 100]

// Example 4
for (int32_t i = 0; i < srcLen; ++i) {
    auto element = inputLocal(i); // Obtain the value of the ith position in inputLocal.
}
// The result of example 4 is as follows:
// The number of elements is 100.

// Example 5
auto size = inputLocal.GetSize(); // Obtain the length of inputLocal. The size indicates the number of elements in inputLocal.
// The result of example 5 is as follows:
// The size is srcLen, 256.

// Example 6
// Usage of operator[], in which inputLocal[16] indicates a new tensor with an offset of 16 starting from the start address.
AscendC::Add(outputLocal[16], inputLocal[16], inputLocal2[16], M);
// The result of example 6 is as follows:
// Input data (inputLocal): [100 100 100 ... 100]
// Input data (inputLocal2): [1 2 3 ... 66]
// Output data (outputLocal): [... 117 118 119 ... 166]

// Example 7
AscendC::TTagType tag = 10;
inputLocal.SetUserTag(tag); // Set the tag information for the LocalTensor.

// Example 8
AscendC::LocalTensor<half> tensor1 = que1.DeQue<half>();
AscendC::TTagType tag1 = tensor1.GetUserTag();
AscendC::LocalTensor<half> tensor2 = que2.DeQue<half>();
AscendC::TTagType tag2 = tensor2.GetUserTag();
AscendC::LocalTensor<half> tensor3 = que3.AllocTensor<half>();
/* Use tags to control the execution of conditional statements. */
if ((tag1 <= 10) && (tag2 >= 9)) {
    AscendC::Add(tensor3, tensor1, tensor2, TILE_LENGTH); // The addition operation can be performed only when the value of tag1 is less than or equal to 10 and the value of tag2 is greater than or equal to 9.
}
// Example 9
// input_local is of the int32_t type and contains 16 elements (64 bytes).
for (int32_t i = 0; i < 16; ++i) {
    inputLocal.SetValue(i, i); // Assign i to the ith position in inputLocal.
}

// Call ReinterpretCast to reinterpret input_local to the int16_t type.
AscendC::LocalTensor<int16_t> interpreTensor = inputLocal.ReinterpretCast<int16_t>();
// The result of example 9 is as follows. The data of the two is the same and the same address is used in the physical memory. The data is reinterpreted based on different types.
// inputLocal:0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
// interpreTensor:0 0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 0 10 0 11 0 12 0 13 0 14 0 15 0

// Example 10
// Call GetPhyAddr() to return the address of the LocalTensor. The pointer type (T*) is returned on the CPU, and the physical storage address (uint64_t) is returned on the NPU.
#ifdef ASCEND_CPU_DEBUG
float *inputLocalCpuPtr = inputLocal.GetPhyAddr();
uint64_t realAddr = (uint64_t)inputLocalCpuPtr - (uint64_t)(GetTPipePtr()->GetBaseAddr(static_cast<int8_t>(AscendC::QuePosition::VECCALC)));
#else
uint64_t realAddr = inputLocal.GetPhyAddr();
#endif

// Example 11
AscendC::QuePosition srcPos = (AscendC::QuePosition)inputLocal.GetPosition();
if (srcPos == AscendC::QuePosition::VECCALC) {
   // Processing logic 1
} else if (srcPos == AscendC::QuePosition::A1) {
   // Processing logic 2
} else {
    // Processing logic 3
}

// Example 12
// Obtain the length (in byte) of localTensor. The data type is int32_t. Therefore, the length is 16 × sizeof(int32_t).
uint32_t len = inputLocal.GetLength();
// inputLocal:0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
// len: 64

// Example 13: Set ShapeInfo of a tensor.
AscendC::LocalTensor<float> maxUb = softmaxMaxBuf.template Get<float>();
uint32_t shapeArray[] = {16, 1024};
maxUb.SetShapeInfo(AscendC::ShapeInfo(2, shapeArray, AscendC::DataFormat::ND));

// Example 14: Obtain ShapeInfo of a tensor.
AscendC::ShapeInfo maxShapeInfo = maxUb.GetShapeInfo();
uint32_t orgShape0 = maxShapeInfo.originalShape[0];
uint32_t orgShape1 = maxShapeInfo.originalShape[1];
uint32_t orgShape2 = maxShapeInfo.originalShape[2];
uint32_t orgShape3 = maxShapeInfo.originalShape[3];
uint32_t shape2 = maxShapeInfo.shape[2];

// Example 15: Use SetAddrWithOffset to obtain and define a tensor and specify the offset of the new tensor relative to the start address of the old tensor.
// Note that the offset length is the number of elements of the old tensor.
AscendC::LocalTensor<float> tmpBuffer1 = tempBmm2Queue.AllocTensor<float>();
AscendC::LocalTensor<half> tmpHalfBuffer;
tmpHalfBuffer.SetAddrWithOffset(tmpBuffer1, calcSize * 2);

// Example 16: Use SetBufferLen to change the length of the allocated tensor to 1024 (unit: byte).
AscendC::LocalTensor<float> tmpBuffer2 = tempBmm2Queue.AllocTensor<float>();
tmpBuffer2.SetBufferLen(1024);

// Example 17: Use SetSize to change the length of the allocated tensor to 256 (unit: element).
AscendC::LocalTensor<float> tmpBuffer3 = tempBmm2Queue.AllocTensor<float>();
tmpBuffer3.SetSize(256);

#ifdef ASCEND_CPU_DEBUG
// Example 18: Used only for CPU debugging. Dump LocalTensor data to a file that is stored in the execution directory for precision debugging.
AscendC::LocalTensor<float> tmpTensor = softmaxMaxBuf.template Get<float>();
tmpTensor.ToFile("tmpTensor.bin");

// Example 19: Used only for CPU debugging. Prints LocalTensor data in the debugging window for precision debugging. Each line contains one data block (32 bytes).
AscendC::LocalTensor<int32_t> inputLocal = softmaxMaxBuf.template Get<int32_t>();
for (int32_t i = 0; i < 16; ++i) {
    inputLocal.SetValue(i, i); // Assign i to the ith position in input_local.
}
inputLocal.Print();
// 0000: 0 1 2 3 4 5 6 7 8
// 0008: 9 10 11 12 13 14 15
#endif

Parent topic: Definitions of Data Types