AllocTensor

Product Support

Product	Supported
Atlas A3 training products / Atlas A3 inference products	√
Atlas A2 training products / Atlas A2 inference products	√
Atlas 200I/500 A2 inference products	√
Atlas inference product 's AI Core	√
Atlas inference product 's Vector Core	x
Atlas training products	√

Function

Allocates tensors from the queue. The size occupied by tensors is the length of each buffer configured when InitBuffer is used.

Prototype

Non-inplace API: constructs a new tensor as the object for memory management.

        
             template <typename T>
__aicore__ inline LocalTensor<T> AllocTensor()

Inplace API: directly uses the input tensor as the object for memory management, reducing the overhead of repeatedly creating tensors. For details about how to use the API, see How to Improve Operator Performance Through Inplace Tensor Operations.

        
             template <typename T>
__aicore__ inline void AllocTensor(LocalTensor<T>& tensor)

Parameters

**Table 1** Template parameters
Parameter	Description
T	Data type of the tensor.

**Table 2** Parameters
Parameter	Input/Output	Description
tensor	Input	For the inplace API, LocalTensor needs to be passed as the object for memory management.

Restrictions

The number of tensors that can be consecutively allocated by calling the AllocTensor API on all queues of the same TPosition is restricted, which varies depending on the AI processor model, and must meet the following constraints during buffer allocation.
For Atlas training products , the maximum number is 4.

For the Atlas inference product 's AI Core, the maximum number is 8.

For the Atlas inference product 's Vector Core, the maximum number is 8.

For Atlas A2 training products / Atlas A2 inference products , the maximum number is 8.

For Atlas A3 training products / Atlas A3 inference products , the maximum number is 8.

For Atlas 200I/500 A2 inference products , the maximum number is 8.
The tensor content allocated by the non-inplace API may contain random values.
For the non-inplace API, the depth template parameter of TQueBind must be set to a non-zero value. For the inplace API, the depth template parameter of TQueBind must be set to 0.

Returns

The return value of the non-inplace API is a LocalTensor object, and the inplace API has no return value.

Example

Example 1

        
             // Use AllocTensor to allocate tensors.
AscendC::TPipe pipe;
AscendC::TQue<AscendC::TPosition::VECOUT, 2> que;
int num = 2;
int len = 1024;
pipe.InitBuffer(que, num, len); // Two buffers are allocated by InitBuffer, and the size of each buffer is 1024 bytes.
AscendC::LocalTensor<half> tensor1 = que.AllocTensor<half>(); // The length of the tensor allocated by AllocTensor is 1024 bytes.

Example 2

        
         
           
           
             // Use AllocTensor consecutively.
AscendC::TQue<AscendC::TPosition::VECIN, 1> que0;
AscendC::TQue<AscendC::TPosition::VECIN, 1> que1;
AscendC::TQue<AscendC::TPosition::VECIN, 1> que2;
AscendC::TQue<AscendC::TPosition::VECIN, 1> que3;
AscendC::TQue<AscendC::TPosition::VECIN, 1> que4;
AscendC::TQue<AscendC::TPosition::VECIN, 1> que5;
// Not recommended:
// For example, if the operator has six inputs, six buffers need to be allocated.
// Allocate one buffer for each of the six queues que0 to que5. The total number of buffers allocated on VECIN TPosition is 6.
// Assume that the maximum number of buffers that can be allocated consecutively on the same TPosition is 4. If the number exceeds 4, resource allocation may fail when AllocTensor or FreeTensor is used.
// Abnormal behaviors such as suspension may occur on the NPU. In the CPU debugging scenario, an error is reported.
pipe.InitBuffer(que0, 1, len);
pipe.InitBuffer(que1, 1, len);
pipe.InitBuffer(que2, 1, len);
pipe.InitBuffer(que3, 1, len);
pipe.InitBuffer(que4, 1, len);
pipe.InitBuffer(que5, 1, len);

AscendC::LocalTensor<T> local1 = que0.AllocTensor<T>();
AscendC::LocalTensor<T> local2 = que1.AllocTensor<T>();
AscendC::LocalTensor<T> local3 = que2.AllocTensor<T>();
AscendC::LocalTensor<T> local4 = que3.AllocTensor<T>();
// The fifth AllocTensor fails to allocate resources. The number of tensors allocated on the same TPosition at the same time exceeds 4.
AscendC::LocalTensor<T> local5 = que4.AllocTensor<T>();

// You are advised to perform the following operations to solve the problem:
// If multiple buffers are used, you can combine multiple buffers into one buffer and use the buffer through offset.
pipe.InitBuffer(que0, 1, len * 3);
pipe.InitBuffer(que1, 1, len * 3);
/*
* Three local tensors are allocated. The address of local1 is the start address of the buffer in que0.
* The address of local2 is the address of local1 with offset len, and the address of local3 is the offset address of local1.
* len * 2 address
 */
int32_t offset1 = len;
int32_t offset2 = len * 2;
AscendC::LocalTensor<T> local1 = que0.AllocTensor<T>();
AscendC::LocalTensor<T> local2 = local1[offset1];
AscendC::LocalTensor<T> local3 = local1[offset2];

            

          

        
       

Example 3: inplace API

        
             AscendC::TPipe pipe;
AscendC::TQue<AscendC::QuePosition::VECIN, 0> que;
int num = 2;
int len = 1024;
pipe.InitBuffer(que, num, len); // Two buffers are allocated by InitBuffer, and the size of each buffer is 1024 bytes.
AscendC::LocalTensor<half> tensor1;
que.AllocTensor<half>(tensor1); // The length of the tensor allocated by AllocTensor is 1024 bytes.

Parent topic: TQue