LocalTensor Constructor
Supported Products
|
Product |
Supported (Pipe Framework) |
Supported (Static Tensor Programming) |
|---|---|---|
|
|
√ |
√ |
|
|
√ |
√ |
|
|
√ |
x |
|
|
√ |
√ |
|
|
√ |
√ |
|
|
√ |
x |
Functions
LocalTensor constructor.
Prototype
- This function is applicable to the Pipe programming framework. Generally, you do not need to call this function directly. This function does not assign initial values to the member variables of the LocalTensor. The initial values are random.
__aicore__ inline LocalTensor<T>() {} - This function is applicable to static tensor programming. It returns a tensor object based on the specified logical position, address, or length.
1 2
__aicore__ inline LocalTensor<T>(TPosition pos, uint32_t addr, uint32_t tileSize) __aicore__ inline LocalTensor<T>(uint32_t addr)
Parameters
|
Parameter |
Description |
|---|---|
|
T |
|
|
Parameter |
Input/Output |
Description |
|---|---|---|
|
pos |
Input |
Logical position of the LocalTensor. |
|
addr |
Input |
Start address of the LocalTensor. The value range is [0, maximum size of the corresponding physical memory). The start address must be 32-byte aligned. |
|
tileSize |
Input |
Number of elements in the LocalTensor. The sum of addr and tileSize (converted into the number of bytes) cannot exceed the range of the corresponding physical memory. |
Returns
None
Restrictions
None
Examples
This section provides the usage example of the LocalTensor constructor and the call example of all its member functions.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 |
// srcLen = 256, num = 100, M=50 // Example 1 for (int32_t i = 0; i < srcLen; ++i) { inputLocal.SetValue(i, num); // Assign num to the ith position in inputLocal. } // The result of example 1 is as follows: // Data (inputLocal): [100 100 100... 100] // Example 2 for (int32_t i = 0; i < srcLen; ++i) { auto element = inputLocal.GetValue(i); // Obtain the value of the ith position in inputLocal. } // The result of example 2 is as follows: // The number of elements is 100. // Example 3 for (int32_t i = 0; i < srcLen; ++i) { inputLocal(i) = num; // Assign num to the ith position in inputLocal. } // The result of example 3 is as follows: // Data (inputLocal): [100 100 100... 100] // Example 4 for (int32_t i = 0; i < srcLen; ++i) { auto element = inputLocal(i); // Obtain the value of the ith position in inputLocal. } // The result of example 4 is as follows: // The number of elements is 100. // Example 5 auto size = inputLocal.GetSize(); // Obtains the length of inputLocal. The size is the number of elements in inputLocal. // The result of example 5 is as follows: // The size is srcLen, 256. // Example 6 // Usage of operator[], in which inputLocal[16] indicates a new tensor with an offset of 16 starting from the start address. AscendC::Add(outputLocal[16], inputLocal[16], inputLocal2[16], M); // The result of example 6 is as follows: // Input data (inputLocal): [100 100 100 ... 100] // Input data (inputLocal2): [1 2 3 ... 66] // Output data (outputLocal): [... 117 118 119 ... 166] // Example 7 AscendC::TTagType tag = 10; inputLocal.SetUserTag(tag); // Set the tag information for the LocalTensor. // Example 8 AscendC::LocalTensor<half> tensor1 = que1.DeQue<half>(); AscendC::TTagType tag1 = tensor1.GetUserTag(); AscendC::LocalTensor<half> tensor2 = que2.DeQue<half>(); AscendC::TTagType tag2 = tensor2.GetUserTag(); AscendC::LocalTensor<half> tensor3 = que3.AllocTensor<half>(); /* Use tags to control the execution of conditional statements. */ if ((tag1 <= 10) && (tag2 >= 9)) { AscendC::Add(tensor3, tensor1, tensor2, TILE_LENGTH); // The addition operation can be performed only when the value of tag1 is less than or equal to 10 and the value of tag2 is greater than or equal to 9. } // Example 9 // input_local is of the int32_t type and contains 16 elements (64 bytes). for (int32_t i = 0; i < 16; ++i) { inputLocal.SetValue(i, i); // Assign i to the ith position in inputLocal. } // Call ReinterpretCast to reinterpret input_local to the int16_t type. AscendC::LocalTensor<int16_t> interpreTensor = inputLocal.ReinterpretCast<int16_t>(); // The result of example 9 is as follows. The data of the two is the same and the same address is used in the physical memory. The data is reinterpreted based on different types. // inputLocal:0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 // interpreTensor:0 0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 0 10 0 11 0 12 0 13 0 14 0 15 0 // Example 10 // Call GetPhyAddr() to return the address of the LocalTensor. The pointer type (T*) is returned on the CPU, and the physical storage address (uint64_t) is returned on the NPU. #ifdef ASCEND_CPU_DEBUG float *inputLocalCpuPtr = inputLocal.GetPhyAddr(); uint64_t realAddr = (uint64_t)inputLocalCpuPtr - (uint64_t)(GetTPipePtr()->GetBaseAddr(static_cast<int8_t>(AscendC::TPosition::VECCALC))); #else uint64_t realAddr = inputLocal.GetPhyAddr(); #endif // Example 11 AscendC::TPosition srcPos = (AscendC::TPosition)inputLocal.GetPosition(); if (srcPos == AscendC::TPosition::VECCALC) { // Processing logic 1 } else if (srcPos == AscendC::TPosition::A1) { // Processing logic 2 } else { // Processing logic 3 } // Example 12 // Obtain the length (in byte) of localTensor. The data type is int32_t. Therefore, the length is 16 × sizeof(int32_t). uint32_t len = inputLocal.GetLength(); // inputLocal:0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 // len: 64 // Example 13: Set ShapeInfo of a tensor. AscendC::LocalTensor<float> maxUb = softmaxMaxBuf.template Get<float>(); uint32_t shapeArray[] = {16, 1024}; maxUb.SetShapeInfo(AscendC::ShapeInfo(2, shapeArray, AscendC::DataFormat::ND)); // Example 14: Obtain ShapeInfo of a tensor. AscendC::ShapeInfo maxShapeInfo = maxUb.GetShapeInfo(); uint32_t orgShape0 = maxShapeInfo.originalShape[0]; uint32_t orgShape1 = maxShapeInfo.originalShape[1]; uint32_t orgShape2 = maxShapeInfo.originalShape[2]; uint32_t orgShape3 = maxShapeInfo.originalShape[3]; uint32_t shape2 = maxShapeInfo.shape[2]; // Example 15: Use SetAddrWithOffset to obtain and define a tensor and specify the offset of the new tensor relative to the start address of the old tensor. // Note that the offset length is the number of elements of the old tensor. AscendC::LocalTensor<float> tmpBuffer1 = tempBmm2Queue.AllocTensor<float>(); AscendC::LocalTensor<half> tmpHalfBuffer; tmpHalfBuffer.SetAddrWithOffset(tmpBuffer1, calcSize * 2); // Example 16: Use SetBufferLen to change the length of the allocated tensor to 1024 (unit: byte). AscendC::LocalTensor<float> tmpBuffer2 = tempBmm2Queue.AllocTensor<float>(); tmpBuffer2.SetBufferLen(1024); // Example 17: Use SetSize to change the length of the allocated tensor to 256 (unit: element). AscendC::LocalTensor<float> tmpBuffer3 = tempBmm2Queue.AllocTensor<float>(); tmpBuffer3.SetSize(256); #ifdef ASCEND_CPU_DEBUG // Example 18: Used only for CPU debugging. Dump LocalTensor data to a file that is stored in the execution directory for precision debugging. AscendC::LocalTensor<float> tmpTensor = softmaxMaxBuf.template Get<float>(); tmpTensor.ToFile("tmpTensor.bin"); // Example 19: Used only for CPU debugging. Prints LocalTensor data in the debugging window for precision debugging. Each line contains one data block (32 bytes). AscendC::LocalTensor<int32_t> inputLocal = softmaxMaxBuf.template Get<int32_t>(); for (int32_t i = 0; i < 16; ++i) { inputLocal.SetValue(i, i); // Assign i to the ith position in input_local. } inputLocal.Print(); // 0000: 0 1 2 3 4 5 6 7 8 // 0008: 9 10 11 12 13 14 15 #endif // Example 20 is used in static tensor programming. A tensor object is constructed based on the input logical position VECIN, start address 128, number of elements 32, and data type float. uint32_t addr = 128; uint32_t tileSize = 32; AscendC::LocalTensor<float> tensor1 = AscendC::LocalTensor<float>(AscendC::TPosition::VECIN, addr, tileSize); // A tensor object is constructed based on the input TensorTrait information and start address 128. // Its logical location is VECIN, the data type is float, and the number of tensor elements is 16 x 16 x 16. template <uint32_t v> using UIntImm = Std::integral_constant<uint32_t, v>; ... auto shape = AscendC::MakeShape(UIntImm<16>{}, UIntImm<16>{}, UIntImm<16>{}); auto stride = AscendC::MakeStride(UIntImm<0>{}, UIntImm<0>{}, UIntImm<0>{}); auto layoutMake = AscendC::MakeLayout(shape, stride); auto tensorTraitMake = AscendC::MakeTensorTrait<float, AscendC::TPosition::VECIN>(layoutMake); uint32_t addr = 128; auto tensor1 = AscendC::LocalTensor<decltype(tensorTraitMake)>(addr); |