SetL2CacheHint

Product Support

Product

Supported

Atlas A3 training products/Atlas A3 inference products

Atlas A2 training products/Atlas A2 inference products

Atlas 200I/500 A2 inference products

x

Atlas inference product's AI Core

x

Atlas inference product's Vector Core

x

Atlas training products

x

Function

Sets whether to enable the L2 cache for the GlobalTensor. By default, the L2 cache is enabled.

Prototype

1
2
template<CacheRwMode rwMode = CacheRwMode::RW>
__aicore__ inline void SetL2CacheHint(CacheMode mode);

Parameters

Table 1 Template parameters

Parameter

Description

rwMode

Read/write mode of the L2 cache.

1
2
3
4
5
enum CacheRwMode {
READ = 1,
WRITE = 2,
RW = 3
};

Reserved parameter for future use. You can use the default value.

Table 2 Parameters

Parameter

Input/Output

Description

mode

Input

L2 cache mode specified by the user.

1
2
3
4
enum class CacheMode : uint8_t {
CACHE_MODE_DISABLE = 0, // Disable the L2 cache.
CACHE_MODE_NORMAL = 1,  // Enable the L2 cache.
};

If enabling the L2 cache for a GlobalTensor during operator writing results in lower performance than disabling the L2 cache, you can manually disable the L2 cache for the GlobalTensor. For example, if an operator reads a GlobalTensor only once, loading the data into the L2 cache does not benefit the operator. Instead, frequent data movement to the L2 cache may cause performance loss. In this case, you can consider disabling the L2 cache for the GlobalTensor.

If this API is not called, the default value CacheMode::CACHE_MODE_NORMAL is used, meaning that the L2 cache is enabled for the GlobalTensor.

Returns

None

Restrictions

Currently, this API can be used only in custom operator projects and is not supported in kernel direct debugging projects.

Example

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
void Init(__gm__ uint8_t *src_gm, __gm__ uint8_t *dst_gm)
{
    uint64_t dataSize = 256; // Set the size of input_global to 256.

    AscendC::GlobalTensor<int32_t> inputGlobal; // The type is int32_t.
    inputGlobal.SetGlobalBuffer(reinterpret_cast<__gm__ int32_t *>(src_gm), dataSize); // Set the start address of the source operand in the global memory to src_gm and the size of the external storage occupied by the source operand to 256 int32_t data elements.
    inputGlobal.SetL2CacheHint(AscendC::CacheMode::CACHE_MODE_DISABLE); // Specify that the GlobalTensor will not be written to the L2 cache.

    AscendC::LocalTensor<int32_t> inputLocal = inQueueX.AllocTensor<int32_t>();    
    AscendC::DataCopy(inputLocal, inputGlobal, dataSize); // Copy inputGlobal from the global memory to inputLocal of the local memory.
    ...
}