SetL2CacheHint
Product Support
Product |
Supported |
|---|---|
√ |
|
√ |
|
x |
|
x |
|
x |
|
x |
Function
Sets whether to enable the L2 cache for the GlobalTensor. By default, the L2 cache is enabled.
Prototype
1 2 | template<CacheRwMode rwMode = CacheRwMode::RW> __aicore__ inline void SetL2CacheHint(CacheMode mode); |
Parameters
Parameter |
Description |
||
|---|---|---|---|
rwMode |
Read/write mode of the L2 cache.
Reserved parameter for future use. You can use the default value. |
Parameter |
Input/Output |
Description |
||
|---|---|---|---|---|
mode |
Input |
L2 cache mode specified by the user.
If enabling the L2 cache for a GlobalTensor during operator writing results in lower performance than disabling the L2 cache, you can manually disable the L2 cache for the GlobalTensor. For example, if an operator reads a GlobalTensor only once, loading the data into the L2 cache does not benefit the operator. Instead, frequent data movement to the L2 cache may cause performance loss. In this case, you can consider disabling the L2 cache for the GlobalTensor. If this API is not called, the default value CacheMode::CACHE_MODE_NORMAL is used, meaning that the L2 cache is enabled for the GlobalTensor. |
Returns
None
Restrictions
Currently, this API can be used only in custom operator projects and is not supported in kernel direct debugging projects.
Example
1 2 3 4 5 6 7 8 9 10 11 12 | void Init(__gm__ uint8_t *src_gm, __gm__ uint8_t *dst_gm) { uint64_t dataSize = 256; // Set the size of input_global to 256. AscendC::GlobalTensor<int32_t> inputGlobal; // The type is int32_t. inputGlobal.SetGlobalBuffer(reinterpret_cast<__gm__ int32_t *>(src_gm), dataSize); // Set the start address of the source operand in the global memory to src_gm and the size of the external storage occupied by the source operand to 256 int32_t data elements. inputGlobal.SetL2CacheHint(AscendC::CacheMode::CACHE_MODE_DISABLE); // Specify that the GlobalTensor will not be written to the L2 cache. AscendC::LocalTensor<int32_t> inputLocal = inQueueX.AllocTensor<int32_t>(); AscendC::DataCopy(inputLocal, inputGlobal, dataSize); // Copy inputGlobal from the global memory to inputLocal of the local memory. ... } |