GlobalTensor

Function Usage

Stores the global data of the global memory (external storage).

Prototype

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
template <typename T> class GlobalTensor : public BaseGlobalTensor<T> {
public:
    // PrimT is used to extract the basic data type LiteType from TensorTrait when T is of the TensorTrait type.
    using PrimType = PrimT<T>;
    __aicore__ inline GlobalTensor<T>() {}
    __aicore__ inline void SetGlobalBuffer(__gm__ PrimType* buffer, uint64_t bufferSize);  // Pass in the pointer to the global data, set a buffer size, and initialize the GlobalTensor.
    __aicore__ inline void SetGlobalBuffer(__gm__ PrimType* buffer);  // Pass in the pointer to the global data, and initialize the GlobalTensor. The buffer size does not necessarily need to be passed in, but the length obtained by using GetSize is a random value.
    __aicore__ inline const __gm__ PrimType* GetPhyAddr() const;   // Return the address of the global data.
__aicore__ inline __gm__ PrimType* GetPhyAddr(const uint64_t offset) const; // Return the address (with offset elements) of the global data.
    __aicore__ inline __inout_pipe__(S) PrimType GetValue(const uint64_t offset) const;  // Obtain the value of the corresponding offset position of a tensor.
    __aicore__ inline __inout_pipe__(S) __gm__ PrimType& operator()(const uint64_t offset) const;   // Return the reference of an element numbered as index.
    __aicore__ inline void SetValue(const uint64_t offset, PrimType value);  // Set the value of the corresponding offset position of a tensor.
    __aicore__ inline uint64_t GetSize() const;  // Return the number of elements in a tensor.
    __aicore__ inline GlobalTensor operator[](const uint64_t offset) const; // Return the GlobalTensor with the specified offset of elements.
    __aicore__ inline void SetShapeInfo(const ShapeInfo& shapeInfo);
    __aicore__ inline ShapeInfo GetShapeInfo() const;
    template<CacheRwMode rwMode = CacheRwMode::RW>
    __aicore__ inline void SetL2CacheHint(CacheMode mode);   // Set the mode (allowed/forbidden) for writing tensors to the L2 cache.
    
    ...
};

Function Description

Type T supports the basic data types and the TensorTrait type, but must comply with the data types supported by the instructions that use the GlobalTensor.

Table 1 Function description

Function Name

Input Parameter

Description

GetValue

offset: offset value, in elements

Obtains a value in GlobalTensor. The immediate of the T type is returned.

Notes:

There is a probability that the global memory address of GetValue can be overwritten externally. In this case, you need to call DataCacheCleanAndInvalid to ensure cache consistency between the data and global memory before calling this API.

SetValue

offset: offset value, in elements

value: configured value. The unit can be any type.

Sets a value in the GlobalTensor.

  • Due to different hardware implementation, this operation is different from the general CPU scalar value assignment operation. During the value assignment operation using SetValue, the data cache in each AI Core is overwritten first rather than being written to the global memory immediately. The subsequent write operations are performed in the unit of cache line (64 bytes). Before using this API, you must understand the data cache structure and cache consistency principles (see DataCacheCleanAndInvalid). Otherwise, this API may be misused. Exercise caution when using it.
  • After SetValue is called, the data cache in each AI Core is overwritten first. If you want to write data to the global memory immediately, call this API and then DataCacheCleanAndInvalid to ensure that the data cache is consistent with that in the global memory.
  • When multiple cores operate the addresses of global memory, the addresses operated by different cores must have at least an offset of the cache line size (the offset parameter is used to set the element offset, which can be converted into an address). Otherwise, multi-core data will be randomly overwritten. In addition, address alignment (64 bytes) needs to be considered. For details, see Example.

SetGlobalBuffer

buffer: global data pointer passed from the host

bufferSize: number of T data elements. Ensure that the value does not exceed the actual data length.

Sets the storage position of the GlobalTensor: buffer points to the start address of the external storage, and bufferSize indicates the size of the external storage occupied by the tensor. For example, if the pointed external storage has 256 consecutive int32_t data elements, bufferSize is 256.

GetPhyAddr

-

Returns the address of the GlobalTensor. If offset is passed in, offset elements are offset.

GetSize

-

Returns the number of elements in the GlobalTensor.

operator[]

offset: offset position specified by the user

Returns a new tensor based on the input offset. The unit of offset is the number of elements.

SetShapeInfo

shapeInfo: ShapeInfo structure

Sets shapeInfo of the GlobalTensor.

GetShapeInfo

None

Obtains shapeInfo of the GlobalTensor. Note: There is no default value for shapeInfo. This API can be called to obtain the correct shapeInfo only after the shape information is set by calling SetShapeInfo.

Example

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
void Init(__gm__ uint8_t *src_gm, __gm__ uint8_t *dst_gm)
{
    uint64_t dataSize = 256; // Set the size of input_global to 256.

    AscendC::GlobalTensor<int32_t> inputGlobal; // The type is int32_t.
    inputGlobal.SetGlobalBuffer(reinterpret_cast<__gm__ int32_t *>(src_gm), dataSize); // Set the start address of the source operand in the global memory to src_gm and the size of the external storage occupied by the source operand to 256 int32_t data elements.

    AscendC::LocalTensor<int32_t> inputLocal = inQueueX.AllocTensor<int32_t>();    
    AscendC::DataCopy(inputLocal, inputGlobal, dataSize); // Copy inputGlobal from global memory to inputLocal of local memory.
    ...

}