LocalTensor Constructor

Product Support

Product	Supported (Pipe Framework)	Supported (Static Tensor Programming)
Atlas A3 training products / Atlas A3 inference products	√	√
Atlas A2 training products / Atlas A2 inference products	√	√
Atlas 200I/500 A2 inference products	√	x
Atlas inference product 's AI Core	√	√
Atlas inference product 's Vector Core	√	√
Atlas training products	√	x

Function

Constructs an object of class LocalTensor.

Prototype

This function is applicable to the Pipe programming framework and is generally not called directly by developers. It leaves the LocalTensor member variables uninitialized, with their values being random.
```
__aicore__ inline LocalTensor<T>() {}
```

This function is applicable to static tensor programming and returns a tensor object based on the specified logical position/address/length.

        
             __aicore__ inline LocalTensor<T>(TPosition pos, uint32_t addr, uint32_t tileSize)
__aicore__ inline LocalTensor<T>(uint32_t addr)

Parameters

**Table 1** Template parameters
Parameter	Description
T	This parameter is applicable to the prototype of the pipe programming framework and supports basic data types and the TensorTrait type. This parameter is applicable to the prototype of static tensor programming and supports the following data types: // Only basic data types are supported. __aicore__ inline LocalTensor<T>(TPosition pos, uint32_t addr, uint32_t tileSize) // Only the TensorTrait type is supported. __aicore__ inline LocalTensor<T>(uint32_t addr)

**Table 2** Parameters
Parameter	Input/Output	Description
pos	Input	Logical position of the LocalTensor.
addr	Input	Start address of the LocalTensor. The value range is [0, maximum size of the corresponding physical memory). The start address must be 32-byte aligned.
tileSize	Input	Number of elements in the LocalTensor. The sum of addr and tileSize (converted into the number of occupied bytes) cannot exceed the range of the corresponding physical memory.

Returns

None

Restrictions

None

Example

This section provides examples of using the LocalTensor constructor and calling all its member functions.

      
       
         
         
           // srcLen = 256, num = 100, M=50
// Example 1
for (int32_t i = 0; i < srcLen; ++i) {
    inputLocal.SetValue(i, num); // Assign num to the ith position in inputLocal.
}
// The result of example 1 is as follows:
// Data (inputLocal): [100 100 100... 100]

// Example 2
for (int32_t i = 0; i < srcLen; ++i) {
    auto element = inputLocal.GetValue(i); // Obtain the value of the ith position in inputLocal.
}
// The result of example 2 is as follows:
// The number of elements is 100.

// Example 3
for (int32_t i = 0; i < srcLen; ++i) {
    inputLocal(i) = num; // Assign num to the ith position in inputLocal.
}
// The result of example 3 is as follows:
// Data (inputLocal): [100 100 100... 100]

// Example 4
for (int32_t i = 0; i < srcLen; ++i) {
    auto element = inputLocal(i); // Obtain the value at the i th position in inputLocal.
}
// The result of example 4 is as follows:
// The number of elements is 100.

// Example 5
auto size = inputLocal.GetSize(); // Obtain the length of inputLocal. The size is the number of elements in inputLocal.
// The result of example 5 is as follows:
// The size is srcLen, 256.

// Example 6
// Usage of operator[]. inputLocal[16] is a new tensor with an offset of 16 starting from the start address.
AscendC::Add(outputLocal[16], inputLocal[16], inputLocal2[16], M);
// The result of example 6 is as follows:
// Input data (inputLocal): [100 100 100... 100]
// Input data (inputLocal2): [1 2 3... 66]
// Output data (outputLocal): [... 117 118 119... 166]

// Example 7
AscendC::TTagType tag = 10;
inputLocal.SetUserTag(tag); // Set tag information for the LocalTensor.

// Example 8
AscendC::LocalTensor<half> tensor1 = que1.DeQue<half>();
AscendC::TTagType tag1 = tensor1.GetUserTag();
AscendC::LocalTensor<half> tensor2 = que2.DeQue<half>();
AscendC::TTagType tag2 = tensor2.GetUserTag();
AscendC::LocalTensor<half> tensor3 = que3.AllocTensor<half>();
/* Use tags to control the execution of conditional statements.*/
if ((tag1 <= 10) && (tag2 >= 9)) {
    AscendC::Add(tensor3, tensor1, tensor2, TILE_LENGTH); // The addition operation can be performed only when tag1 is less than or equal to 10 and tag2 is greater than or equal to 9.
}
// Example 9
// input_local is of the int32_t type and contains 16 elements (64 bytes).
for (int32_t i = 0; i < 16; ++i) {
    inputLocal.SetValue(i, i); // Assign i to the ith position in inputLocal.
}

// Call ReinterpretCast to reinterpret input_local to the int16_t type.
AscendC::LocalTensor<int16_t> interpreTensor = inputLocal.ReinterpretCast<int16_t>();
// The result of example 9 is as follows. The data of the two is the same and the same address is used in the physical memory. The data is reinterpreted based on different types.
// inputLocal:0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
// interpreTensor:0 0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 0 10 0 11 0 12 0 13 0 14 0 15 0

// Example 10
// Call GetPhyAddr() to return the LocalTensor address. On the CPU, a pointer (T*) is returned, while on the NPU, a physical storage address (uint64_t) is returned.
#ifdef ASCEND_CPU_DEBUG
float *inputLocalCpuPtr = inputLocal.GetPhyAddr();
uint64_t realAddr = (uint64_t)inputLocalCpuPtr - (uint64_t)(GetTPipePtr()->GetBaseAddr(static_cast<int8_t>(AscendC::TPosition::VECCALC)));
#else
uint64_t realAddr = inputLocal.GetPhyAddr();
#endif

// Example 11
AscendC::TPosition srcPos = (AscendC::TPosition)inputLocal.GetPosition();
if (srcPos == AscendC::TPosition::VECCALC) {
    // Processing logic 1
} else if (srcPos == AscendC::TPosition::A1) {
    // Processing logic 2
} else {
    // Processing logic 3
}

// Example 12
// Obtain the length (in bytes) of localTensor. The data type is int32_t. Therefore, the length is 16 × sizeof(int32_t).
uint32_t len = inputLocal.GetLength();
// inputLocal:0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
// len: 64

// Example 13: Set ShapeInfo of a tensor.
AscendC::LocalTensor<float> maxUb = softmaxMaxBuf.template Get<float>();
uint32_t shapeArray[] = {16, 1024};
maxUb.SetShapeInfo(AscendC::ShapeInfo(2, shapeArray, AscendC::DataFormat::ND));

// Example 14: Obtain ShapeInfo of a tensor.
AscendC::ShapeInfo maxShapeInfo = maxUb.GetShapeInfo();
uint32_t orgShape0 = maxShapeInfo.originalShape[0];
uint32_t orgShape1 = maxShapeInfo.originalShape[1];
uint32_t orgShape2 = maxShapeInfo.originalShape[2];
uint32_t orgShape3 = maxShapeInfo.originalShape[3];
uint32_t shape2 = maxShapeInfo.shape[2];

// Example 15: Use SetAddrWithOffset to obtain and define a tensor and specify the offset of the new tensor relative to the start address of the old tensor.
// Note that the offset length is the number of elements of the old tensor.
AscendC::LocalTensor<float> tmpBuffer1 = tempBmm2Queue.AllocTensor<float>();
AscendC::LocalTensor<half> tmpHalfBuffer;
tmpHalfBuffer.SetAddrWithOffset(tmpBuffer1, calcSize * 2);

// Example 16: Use SetBufferLen to change the length of the requested tensor to 1024 bytes.
AscendC::LocalTensor<float> tmpBuffer2 = tempBmm2Queue.AllocTensor<float>();
tmpBuffer2.SetBufferLen(1024);

// Example 17: Use SetSize to change the length of the requested tensor to 256 (unit: element).
AscendC::LocalTensor<float> tmpBuffer3 = tempBmm2Queue.AllocTensor<float>();
tmpBuffer3.SetSize(256);

#ifdef ASCEND_CPU_DEBUG
// Example 18: Only for CPU debugging. Dump LocalTensor data to a file that is stored in the execution directory for precision debugging.
AscendC::LocalTensor<float> tmpTensor = softmaxMaxBuf.template Get<float>();
tmpTensor.ToFile("tmpTensor.bin");

// Example 19: Used only for CPU debugging. Print LocalTensor data in the debugging window for precision debugging. Each line contains one data block (32 bytes).
AscendC::LocalTensor<int32_t> inputLocal = softmaxMaxBuf.template Get<int32_t>();
for (int32_t i = 0; i < 16; ++i) {
    inputLocal.SetValue(i, i); // Assign i to the ith position in input_local.
}
inputLocal.Print();
// 0000: 0 1 2 3 4 5 6 7 8
// 0008: 9 10 11 12 13 14 15
#endif

// Example 20: Construct a tensor object in static tensor programming scenarios based on the logical position VECIN, start address 128, element count 32, and data type float.
uint32_t addr = 128;
uint32_t tileSize = 32;
AscendC::LocalTensor<float> tensor1 = AscendC::LocalTensor<float>(AscendC::TPosition::VECIN, addr, tileSize);
// Construct a tensor object based on the input TensorTrait information and start address 128.
// The logical position is VECIN, the data type is float, and the number of tensor elements is 16 × 16 × 16.
template <uint32_t v>
using UIntImm = Std::integral_constant<uint32_t, v>;
...
auto shape = AscendC::MakeShape(UIntImm<16>{}, UIntImm<16>{}, UIntImm<16>{});
auto stride = AscendC::MakeStride(UIntImm<0>{}, UIntImm<0>{}, UIntImm<0>{});
auto layoutMake = AscendC::MakeLayout(shape, stride);
auto tensorTraitMake = AscendC::MakeTensorTrait<float, AscendC::TPosition::VECIN>(layoutMake);
uint32_t addr = 128;
auto tensor1 = AscendC::LocalTensor<decltype(tensorTraitMake)>(addr);

          

        

      
     

Parent topic: LocalTensor