LocalTensor Constructor

Supported Products

Product

Supported (Pipe Framework)

Supported (Static Tensor Programming)

Atlas A3 training products / Atlas A3 inference products

Atlas A2 training products / Atlas A2 inference products

Atlas 200I/500 A2 inference products

x

Atlas inference product 's AI Core

Atlas inference product 's Vector Core

Atlas training products

x

Functions

LocalTensor constructor.

Prototype

  • This function is applicable to the Pipe programming framework. Generally, you do not need to call this function directly. This function does not assign initial values to the member variables of the LocalTensor. The initial values are random.
    __aicore__ inline LocalTensor<T>() {}
  • This function is applicable to static tensor programming. It returns a tensor object based on the specified logical position, address, or length.
    1
    2
    __aicore__ inline LocalTensor<T>(TPosition pos, uint32_t addr, uint32_t tileSize)
    __aicore__ inline LocalTensor<T>(uint32_t addr)
    

Parameters

Table 1 Parameters in the template

Parameter

Description

T

  • Applicable to the prototype of the Pipe programming framework, supporting the basic data types and TensorTrait type.
  • Applicable to the prototype of static tensor programming. The following data types are supported:
    // Only basic data types are supported.
    __aicore__ inline LocalTensor<T>(TPosition pos, uint32_t addr, uint32_t tileSize)
    // Only the TensorTrait type is supported.
    __aicore__ inline LocalTensor<T>(uint32_t addr)
Table 2 Parameters

Parameter

Input/Output

Description

pos

Input

Logical position of the LocalTensor.

addr

Input

Start address of the LocalTensor. The value range is [0, maximum size of the corresponding physical memory). The start address must be 32-byte aligned.

tileSize

Input

Number of elements in the LocalTensor. The sum of addr and tileSize (converted into the number of bytes) cannot exceed the range of the corresponding physical memory.

Returns

None

Restrictions

None

Examples

This section provides the usage example of the LocalTensor constructor and the call example of all its member functions.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
// srcLen = 256, num = 100, M=50
// Example 1
for (int32_t i = 0; i < srcLen; ++i) {
    inputLocal.SetValue(i, num); // Assign num to the ith position in inputLocal.
}
// The result of example 1 is as follows:
// Data (inputLocal): [100 100 100... 100]

// Example 2
for (int32_t i = 0; i < srcLen; ++i) {
    auto element = inputLocal.GetValue(i); // Obtain the value of the ith position in inputLocal.
}
// The result of example 2 is as follows:
// The number of elements is 100.

// Example 3
for (int32_t i = 0; i < srcLen; ++i) {
    inputLocal(i) = num; // Assign num to the ith position in inputLocal.
}
// The result of example 3 is as follows:
// Data (inputLocal): [100 100 100... 100]

// Example 4
for (int32_t i = 0; i < srcLen; ++i) {
    auto element = inputLocal(i); // Obtain the value of the ith position in inputLocal.
}
// The result of example 4 is as follows:
// The number of elements is 100.

// Example 5
auto size = inputLocal.GetSize(); // Obtains the length of inputLocal. The size is the number of elements in inputLocal.
// The result of example 5 is as follows:
// The size is srcLen, 256.

// Example 6
// Usage of operator[], in which inputLocal[16] indicates a new tensor with an offset of 16 starting from the start address.
AscendC::Add(outputLocal[16], inputLocal[16], inputLocal2[16], M);
// The result of example 6 is as follows:
// Input data (inputLocal): [100 100 100 ... 100]
// Input data (inputLocal2): [1 2 3 ... 66]
// Output data (outputLocal): [... 117 118 119 ... 166]

// Example 7
AscendC::TTagType tag = 10;
inputLocal.SetUserTag(tag); // Set the tag information for the LocalTensor.

// Example 8
AscendC::LocalTensor<half> tensor1 = que1.DeQue<half>();
AscendC::TTagType tag1 = tensor1.GetUserTag();
AscendC::LocalTensor<half> tensor2 = que2.DeQue<half>();
AscendC::TTagType tag2 = tensor2.GetUserTag();
AscendC::LocalTensor<half> tensor3 = que3.AllocTensor<half>();
/* Use tags to control the execution of conditional statements. */
if ((tag1 <= 10) && (tag2 >= 9)) {
    AscendC::Add(tensor3, tensor1, tensor2, TILE_LENGTH); // The addition operation can be performed only when the value of tag1 is less than or equal to 10 and the value of tag2 is greater than or equal to 9.
}
// Example 9
// input_local is of the int32_t type and contains 16 elements (64 bytes).
for (int32_t i = 0; i < 16; ++i) {
    inputLocal.SetValue(i, i); // Assign i to the ith position in inputLocal.
}

// Call ReinterpretCast to reinterpret input_local to the int16_t type.
AscendC::LocalTensor<int16_t> interpreTensor = inputLocal.ReinterpretCast<int16_t>();
// The result of example 9 is as follows. The data of the two is the same and the same address is used in the physical memory. The data is reinterpreted based on different types.
// inputLocal:0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
// interpreTensor:0 0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 0 10 0 11 0 12 0 13 0 14 0 15 0

// Example 10
// Call GetPhyAddr() to return the address of the LocalTensor. The pointer type (T*) is returned on the CPU, and the physical storage address (uint64_t) is returned on the NPU.
#ifdef ASCEND_CPU_DEBUG
float *inputLocalCpuPtr = inputLocal.GetPhyAddr();
uint64_t realAddr = (uint64_t)inputLocalCpuPtr - (uint64_t)(GetTPipePtr()->GetBaseAddr(static_cast<int8_t>(AscendC::TPosition::VECCALC)));
#else
uint64_t realAddr = inputLocal.GetPhyAddr();
#endif

// Example 11
AscendC::TPosition srcPos = (AscendC::TPosition)inputLocal.GetPosition();
if (srcPos == AscendC::TPosition::VECCALC) {
   // Processing logic 1
} else if (srcPos == AscendC::TPosition::A1) {
   // Processing logic 2
} else {
    // Processing logic 3
}

// Example 12
// Obtain the length (in byte) of localTensor. The data type is int32_t. Therefore, the length is 16 × sizeof(int32_t).
uint32_t len = inputLocal.GetLength();
// inputLocal:0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
// len: 64

// Example 13: Set ShapeInfo of a tensor.
AscendC::LocalTensor<float> maxUb = softmaxMaxBuf.template Get<float>();
uint32_t shapeArray[] = {16, 1024};
maxUb.SetShapeInfo(AscendC::ShapeInfo(2, shapeArray, AscendC::DataFormat::ND));

// Example 14: Obtain ShapeInfo of a tensor.
AscendC::ShapeInfo maxShapeInfo = maxUb.GetShapeInfo();
uint32_t orgShape0 = maxShapeInfo.originalShape[0];
uint32_t orgShape1 = maxShapeInfo.originalShape[1];
uint32_t orgShape2 = maxShapeInfo.originalShape[2];
uint32_t orgShape3 = maxShapeInfo.originalShape[3];
uint32_t shape2 = maxShapeInfo.shape[2];

// Example 15: Use SetAddrWithOffset to obtain and define a tensor and specify the offset of the new tensor relative to the start address of the old tensor.
// Note that the offset length is the number of elements of the old tensor.
AscendC::LocalTensor<float> tmpBuffer1 = tempBmm2Queue.AllocTensor<float>();
AscendC::LocalTensor<half> tmpHalfBuffer;
tmpHalfBuffer.SetAddrWithOffset(tmpBuffer1, calcSize * 2);

// Example 16: Use SetBufferLen to change the length of the allocated tensor to 1024 (unit: byte).
AscendC::LocalTensor<float> tmpBuffer2 = tempBmm2Queue.AllocTensor<float>();
tmpBuffer2.SetBufferLen(1024);

// Example 17: Use SetSize to change the length of the allocated tensor to 256 (unit: element).
AscendC::LocalTensor<float> tmpBuffer3 = tempBmm2Queue.AllocTensor<float>();
tmpBuffer3.SetSize(256);

#ifdef ASCEND_CPU_DEBUG
// Example 18: Used only for CPU debugging. Dump LocalTensor data to a file that is stored in the execution directory for precision debugging.
AscendC::LocalTensor<float> tmpTensor = softmaxMaxBuf.template Get<float>();
tmpTensor.ToFile("tmpTensor.bin");

// Example 19: Used only for CPU debugging. Prints LocalTensor data in the debugging window for precision debugging. Each line contains one data block (32 bytes).
AscendC::LocalTensor<int32_t> inputLocal = softmaxMaxBuf.template Get<int32_t>();
for (int32_t i = 0; i < 16; ++i) {
    inputLocal.SetValue(i, i); // Assign i to the ith position in input_local.
}
inputLocal.Print();
// 0000: 0 1 2 3 4 5 6 7 8
// 0008: 9 10 11 12 13 14 15
#endif

// Example 20 is used in static tensor programming. A tensor object is constructed based on the input logical position VECIN, start address 128, number of elements 32, and data type float.
uint32_t addr = 128;
uint32_t tileSize = 32;
AscendC::LocalTensor<float> tensor1 = AscendC::LocalTensor<float>(AscendC::TPosition::VECIN, addr, tileSize);
// A tensor object is constructed based on the input TensorTrait information and start address 128.
// Its logical location is VECIN, the data type is float, and the number of tensor elements is 16 x 16 x 16.
template <uint32_t v>
using UIntImm = Std::integral_constant<uint32_t, v>;
...
auto shape = AscendC::MakeShape(UIntImm<16>{}, UIntImm<16>{}, UIntImm<16>{});
auto stride = AscendC::MakeStride(UIntImm<0>{}, UIntImm<0>{}, UIntImm<0>{});
auto layoutMake = AscendC::MakeLayout(shape, stride);
auto tensorTraitMake = AscendC::MakeTensorTrait<float, AscendC::TPosition::VECIN>(layoutMake);
uint32_t addr = 128;
auto tensor1 = AscendC::LocalTensor<decltype(tensorTraitMake)>(addr);