Reading/Writing Data by the Scalar Unit
The Scalar Unit in the AI Core performs scalar computation and controls the program flow. Based on the hardware architecture design, the Scalar Unit supports read and write operations only on the global memory and Unified Buffer, but does not support the access to other types of storage such as L1 Buffer, L0A Buffer, L0B Buffer, or L0C Buffer. The following describes the modes in which the Scalar Unit performs read and write operations on the global memory and Unified Buffer, as well as the synchronization mechanisms involved in these operations.
Reading/Writing Global Memory by the Scalar Unit

As shown in the preceding figure, the Scalar Unit reads and writes GM data through DataCache, which improves the execution efficiency of scalar memory access instructions. Each AIC/AIV core has an independent DataCache. The following example describes the working mechanism of DataCache.
globalTensor1 is a tensor stored in the GM.
- After GetValue(0) is executed, the first eight elements of globalTensor1 are loaded into the DataCache. Subsequent calls to GetValue(1) through GetValue(7) can directly read data from the cache line of the DataCache without accessing the GM, thereby improving the efficiency of continuous scalar access.
- After SetValue(8, val) is executed, elements with indexes 8 to 15 of globalTensor1 are loaded into the DataCache. SetValue only modifies the cache line data in the DataCache and sets the cache line status to Dirty, indicating that the data in the cache line is inconsistent with that in the GM.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
AscendC::GlobalTensor<int64_t> globalTensor1; globalTensor1.SetGlobalBuffer((__gm__ int64_t *)input); // There are eight elements of uint64_t type, from index 0 to 7. The cache line length of the DataCache is 64 bytes. // After GetValue(0) is executed, subsequent calls to GetValue(1) through GetValue(7) can directly read data from the cache line without accessing the GM. globalTensor1.GetValue(0); globalTensor1.GetValue(1); globalTensor1.GetValue(2); globalTensor1.GetValue(3); globalTensor1.GetValue(4); globalTensor1.GetValue(5); globalTensor1.GetValue(6); globalTensor1.GetValue(7); // After SetValue(8) is executed, the data in the GM is not modified, and only the cache line data in the DataCache is modified. // The cache line status is set to dirty, indicating that the data in the cache line of the DataCache is inconsistent with that in the GM. int64_t val = 32; globalTensor1.SetValue(8, val); globalTensor1.GetValue(8); |
According to this working mechanism (as shown in the following figure), data inconsistency may occur when multiple cores access globalTensor1. To ensure that other cores can obtain the updated GM data, you should manually call DataCacheCleanAndInvalid to ensure data consistency.

Reading and Writing Unified Buffer by the Scalar Unit
When the Scalar Unit reads and writes the Unified Buffer, SetValue and GetValue APIs of LocalTensor can be called. The following is an example:
1 2 3 4 5 6 7 |
for (int32_t i = 0; i < 16; ++i) { inputLocal.SetValue(i, i); // Assign i to the ith position in inputLocal. } for (int32_t i = 0; i < srcLen; ++i) { auto element = inputLocal.GetValue(i); // Obtain the value of the ith position in inputLocal. } |
Synchronization During Data Reading/Writing by the Scalar Unit
The Scalar Unit's reading and writing of the GM and Unified Buffer are executed as PIPE_S (scalar pipeline) operations. When the SetValue or GetValue API is called and automatic synchronization is enabled for the operator project, manual insertion of synchronization events is not required.
If automatic synchronization for the operator project is disabled, synchronization events must be inserted manually.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
// GetValue is a Scalar operation and has data dependency with the subsequent Duplicate operation. // Therefore, the Vector pipeline needs to wait until the Scalar operation is complete. float inputVal = srcLocal.GetValue(0); SetFlag<HardEvent::S_V>(eventID1); WaitFlag<HardEvent::S_V>(eventID1); AscendC::Duplicate(dstLocal, inputVal, srcDataSize); // SetValue is a Scalar operation and has data dependency with the subsequent data movement operation. // Therefore, the MTE3 pipeline needs to wait until the Scalar operation is complete. srcLocal.SetValue(0, value); SetFlag<HardEvent::S_MTE3>(eventID2); WaitFlag<HardEvent::S_MTE3>(eventID2); AscendC::DataCopy(dstGlobal, srcLocal, srcDataSize); |