LocalTensor

Function Usage

Stores the data of the Local Memory in the AI Core. The QuePosition can be VECIN, VECOUT, VECCALC, A1, A2, B1, B2, CO1, or CO2.

Prototype

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
template <typename T> class LocalTensor : public BaseLocalTensor<T> {
public:
    // PrimT is used to extract the basic data type LiteType from TensorTrait when T is of the TensorTrait type.
    using PrimType = PrimT<T>;
    __aicore__ inline LocalTensor<T>() {};
#if defined(ASCENDC_CPU_DEBUG) && ASCENDC_CPU_DEBUG == 1
    ~LocalTensor();
    explicit LocalTensor<T>(TBuffAddr& address);
    LocalTensor<T>(const LocalTensor<T>& other);
    LocalTensor<T> operator = (const LocalTensor<T>& other);
    PrimType* GetPhyAddr(const uint32_t offset) const;
    PrimType* GetPhyAddr() const;
    __inout_pipe__(S) PrimType GetValue(const uint32_t offset) const;
    __inout_pipe__(S) PrimType& operator()(const uint32_t offset) const;
    template <typename CAST_T> __aicore__ inline LocalTensor<CAST_T> ReinterpretCast() const;
    template <typename T1> __inout_pipe__(S) void SetValue(const uint32_t index, const T1 value) const;
    LocalTensor operator[](const uint32_t offset) const;
    template <typename T1> void SetAddrWithOffset(LocalTensor<T1> &src, uint32_t offset);
    inline void Print();
    inline void Print(uint32_t len);
    int32_t ToFile(const std::string& fileName) const;
#else
    __aicore__ inline uint64_t GetPhyAddr() const;
    __aicore__ inline uint64_t GetPhyAddr(const uint32_t offset) const;
    __aicore__ inline __inout_pipe__(S) PrimType GetValue(const uint32_t index) const;
    __aicore__ inline __inout_pipe__(S) __ubuf__ PrimType& operator()(const uint32_t offset) const;
    template <typename CAST_T> __aicore__ inline LocalTensor<CAST_T> ReinterpretCast() const;
    template <typename T1> __aicore__ inline __inout_pipe__(S)
        void SetValue(const uint32_t index, const T1 value) const;
    __aicore__ inline LocalTensor operator[](const uint32_t offset) const;
    template <typename T1>
    [[deprecated("NOTICE: SetAddrWithOffset has been deprecated and will be removed in the next version. "
        "Please do not use it!")]]
    __aicore__ inline void SetAddrWithOffset(LocalTensor<T1> &src, uint32_t offset);
#endif
    __aicore__ inline int32_t GetPosition() const;
    __aicore__ inline void SetSize(const uint32_t size);
    __aicore__ inline uint32_t GetSize() const;
    [[deprecated("NOTICE: GetLength has been deprecated and will be removed in the next version. Please do not use "
                 "it!")]]
    __aicore__ inline uint32_t GetLength() const;
    [[deprecated("NOTICE: SetBufferLen has been deprecated and will be removed in the next version. Please do not use "
                 "it!")]]
    __aicore__ inline void SetBufferLen(uint32_t dataLen);
    __aicore__ inline void SetUserTag(const TTagType tag);
    __aicore__ inline TTagType GetUserTag() const;
    ...
    __aicore__ inline void SetShapeInfo(const ShapeInfo& shapeInfo);
    __aicore__ inline ShapeInfo GetShapeInfo() const;
    ...
};

Function Description

Type T supports the basic data types and the TensorTrait type, but must comply with the data types supported by the instructions that use the LocalTensor.

Table 1 Function description

Function Name

Input Parameter

Description

GetValue

offset: offset value, in elements

Obtains a value in the LocalTensor. The number of the PrimType type is returned.

This API is supported only when the TPosition of LocalTensor is VECIN, VECCALC, or VECOUT.

SetValue

offset: offset value, in elements

value: configured value. The unit can be any type.

Sets a value in the LocalTensor.

This API is supported only when the TPosition of LocalTensor is VECIN, VECCALC, or VECOUT.

operator[]

offset: offset value

Obtains the new LocalTensor whose offset value from the start address of the original LocalTensor is offset. Note that offset cannot exceed the size of the original LocalTensor.

operator()

offset: subscript index

Obtains the reference of the offsetth variable of the LocalTensor. As the left value, it is equivalent to the SetValue API. As the right value, it is equivalent to the GetValue API.

GetSize

None

Obtains the current LocalTensor size. The unit is element.

SetSize

size: number of elements. The unit is element.

Sets the current LocalTensor size. The unit is element. When LocalTensor is reused and its length changes, you need to call this API to reset the size.

SetUserTag

tag: tag information. The type TTagType corresponds to int32_t.

Adds user-defined information to a tensor. You can set the corresponding tag as required. You can call GetUserTag to obtain the tag information of a specified tensor and perform operations on the tensor based on the tag information.

GetUserTag

-

Obtains the tag information of a specified tensor block. You can perform different operations on tensors based on the tag information.

ReinterpretCast

-

Reinterprets the current tensor to a new type specified by the user. The address and content of the converted tensor are the same as those of the original tensor, and the tensor size (number of bytes) remains unchanged.

GetPhyAddr

-

Returns the address of the LocalTensor. If offset is passed in, offset elements are offset.

GetPosition

-

Obtains the abstract logical location of the QuePosition. The QuePosition can be VECIN, VECOUT, VECCALC, A1, A2, B1, B2, CO1, or CO2.

GetLength

-

Obtains the data length (in byte) of the LocalTensor.

SetShapeInfo

shapeInfo: ShapeInfo structure

Sets shapeInfo of the LocalTensor.

GetShapeInfo

-

Obtains shapeInfo of the LocalTensor. Note: There is no default value for shapeInfo. This API can be called to obtain the correct shapeInfo only after the shape information is set by calling SetShapeInfo.

SetAddrWithOffset

src: tensor of the basic address, which is used to set the offset tensor address.

offset: offset length

Sets a tensor address with an offset. It is used to quickly obtain and define a tensor and specify the offset of the new tensor relative to the start address of the old tensor. The offset length is the number of elements of the old tensor.

SetBufferLen

dataLen: buffer length

Sets the buffer length, in bytes.

ToFile

fileName: file name

Only for CPU debugging. It dumps LocalTensor data to a file that is stored in the execution directory for precision debugging.

Print

dataLen: number of printed elements

Used only for CPU debugging. It prints LocalTensor data in the debugging window for precision debugging. Each line contains one data block (32 bytes).

Precautions

Do not use SetValue to assign values to LocalTensor frequently. Otherwise, the performance deteriorates. If a large number of values need to be assigned, select basic APIs for data padding or advanced APIs for data padding based on the actual situation. If incremental sequences need to be generated, select ArithProgression.

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
// srcLen = 256, num = 100, M=50
// Example 1
for (int32_t i = 0; i < srcLen; ++i) {
    inputLocal.SetValue(i, num); // Assign num to the ith position in inputLocal.
}
// The result of example 1 is as follows:
// Data (inputLocal): [100 100 100... 100]

// Example 2
for (int32_t i = 0; i < srcLen; ++i) {
    auto element = inputLocal.GetValue(i); // Obtain the value of the ith position in inputLocal.
}
// The result of example 2 is as follows:
// Element is 100.

// Example 3
for (int32_t i = 0; i < srcLen; ++i) {
    inputLocal(i) = num; // Assign num to the ith position in inputLocal.
}
// The result of example 3 is as follows:
// Data (inputLocal): [100 100 100... 100]

// Example 4
for (int32_t i = 0; i < srcLen; ++i) {
    auto element = inputLocal(i); // Obtain the value of the ith position in inputLocal.
}
// The result of example 4 is as follows:
// The number of elements is 100.

// Example 5
auto size = inputLocal.GetSize(); // Obtain the length of inputLocal. The size indicates the number of elements in inputLocal.
// The result of example 5 is as follows:
// The size is srcLen, 256.

// Example 6
// Usage of operator[], in which inputLocal[16] indicates a new tensor with an offset of 16 starting from the start address.
AscendC::Add(outputLocal[16], inputLocal[16], inputLocal2[16], M);
// The result of example 6 is as follows:
// Input data (inputLocal): [100 100 100 ... 100]
// Input data (inputLocal2): [1 2 3 ... 66]
// Output data (outputLocal): [... 117 118 119 ... 166]

// Example 7
AscendC::TTagType tag = 10;
inputLocal.SetUserTag(tag); // Set the tag information for the LocalTensor.

// Example 8
AscendC::LocalTensor<half> tensor1 = que1.DeQue<half>();
AscendC::TTagType tag1 = tensor1.GetUserTag();
AscendC::LocalTensor<half> tensor2 = que2.DeQue<half>();
AscendC::TTagType tag2 = tensor2.GetUserTag();
AscendC::LocalTensor<half> tensor3 = que3.AllocTensor<half>();
/* Use tags to control the execution of conditional statements. */
if ((tag1 <= 10) && (tag2 >= 9)) {
    AscendC::Add(tensor3, tensor1, tensor2, TILE_LENGTH); // The addition operation can be performed only when the value of tag1 is less than or equal to 10 and the value of tag2 is greater than or equal to 9.
}
// Example 9
// input_local is of the int32_t type and contains 16 elements (64 bytes).
for (int32_t i = 0; i < 16; ++i) {
    inputLocal.SetValue(i, i); // Assign i to the ith position in inputLocal.
}

// Call ReinterpretCast to reinterpret input_local to the int16_t type.
AscendC::LocalTensor<int16_t> interpreTensor = inputLocal.ReinterpretCast<int16_t>();
// The result of example 9 is as follows. The data of the two is the same and the same address is used in the physical memory. The data is reinterpreted based on different types.
// inputLocal:0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
// interpreTensor:0 0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 0 10 0 11 0 12 0 13 0 14 0 15 0

// Example 10
// Call GetPhyAddr() to return the address of the LocalTensor. The pointer type (T*) is returned on the CPU, and the physical storage address (uint64_t) is returned on the NPU.
#ifdef ASCEND_CPU_DEBUG
float *inputLocalCpuPtr = inputLocal.GetPhyAddr();
uint64_t realAddr = (uint64_t)inputLocalCpuPtr - (uint64_t)(GetTPipePtr()->GetBaseAddr(static_cast<int8_t>(AscendC::QuePosition::VECCALC)));
#else
uint64_t realAddr = inputLocal.GetPhyAddr();
#endif

// Example 11
AscendC::QuePosition srcPos = (AscendC::QuePosition)inputLocal.GetPosition();
if (srcPos == AscendC::QuePosition::VECCALC) {
   // Processing logic 1
} else if (srcPos == AscendC::QuePosition::A1) {
   // Processing logic 2
} else {
    // Processing logic 3
}

// Example 12
// Obtain the length (in byte) of localTensor. The data type is int32_t. Therefore, the length is 16 × sizeof(int32_t).
uint32_t len = inputLocal.GetLength();
// inputLocal:0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
// len: 64

// Example 13: Set ShapeInfo of a tensor.
AscendC::LocalTensor<float> maxUb = softmaxMaxBuf.template Get<float>();
uint32_t shapeArray[] = {16, 1024};
maxUb.SetShapeInfo(AscendC::ShapeInfo(2, shapeArray, AscendC::DataFormat::ND));

// Example 14: Obtain ShapeInfo of a tensor.
AscendC::ShapeInfo maxShapeInfo = maxUb.GetShapeInfo();
uint32_t orgShape0 = maxShapeInfo.originalShape[0];
uint32_t orgShape1 = maxShapeInfo.originalShape[1];
uint32_t orgShape2 = maxShapeInfo.originalShape[2];
uint32_t orgShape3 = maxShapeInfo.originalShape[3];
uint32_t shape2 = maxShapeInfo.shape[2];

// Example 15: Use SetAddrWithOffset to obtain and define a tensor and specify the offset of the new tensor relative to the start address of the old tensor.
// Note that the offset length is the number of elements of the old tensor.
AscendC::LocalTensor<float> tmpBuffer1 = tempBmm2Queue.AllocTensor<float>();
AscendC::LocalTensor<half> tmpHalfBuffer;
tmpHalfBuffer.SetAddrWithOffset(tmpBuffer1, calcSize * 2);

// Example 16: Use SetBufferLen to change the length of the allocated tensor to 1024 (unit: byte).
AscendC::LocalTensor<float> tmpBuffer2 = tempBmm2Queue.AllocTensor<float>();
tmpBuffer2.SetBufferLen(1024);

// Example 17: Use SetSize to change the length of the allocated tensor to 256 (unit: element).
AscendC::LocalTensor<float> tmpBuffer3 = tempBmm2Queue.AllocTensor<float>();
tmpBuffer3.SetSize(256);

#ifdef ASCEND_CPU_DEBUG
// Example 18: Used only for CPU debugging. Dump LocalTensor data to a file that is stored in the execution directory for precision debugging.
AscendC::LocalTensor<float> tmpTensor = softmaxMaxBuf.template Get<float>();
tmpTensor.ToFile("tmpTensor.bin");

// Example 19: Used only for CPU debugging. Prints LocalTensor data in the debugging window for precision debugging. Each line contains one data block (32 bytes).
AscendC::LocalTensor<int32_t> inputLocal = softmaxMaxBuf.template Get<int32_t>();
for (int32_t i = 0; i < 16; ++i) {
    inputLocal.SetValue(i, i); // Assign i to the ith position in input_local.
}
inputLocal.Print();
// 0000: 0 1 2 3 4 5 6 7 8
// 0008: 9 10 11 12 13 14 15
#endif