Control Units

The control units provide instruction control for the entire computing process and are responsible for running the entire AI Core. Figure 1 shows the control units of an AI Core. For details about each module, see Table 1.

Figure 1 Control units

**Table 1** Control units and related instruction queues (pipes)
Control Unit/Instruction Queue	Description
Scalar Unit	Scalar compute unit.
Cube Queue	Cube instruction queue. Instructions in the same queue are executed in sequence, and instructions in different queues can be executed in parallel.
Vector Queue	Vector instruction queue. Instructions in the same queue are executed in sequence, and instructions in different queues can be executed in parallel.
MTE Queue	MTE instruction queue. Instructions in the same queue are executed in sequence, and instructions in different queues can be executed in parallel.
Event Sync	A module used to control the dependency and synchronization between instructions across queues.

Multiple instructions enter the instruction cache module of the AI Core from the system memory through the bus. Based on the instruction type, there are two kinds of subsequent instruction execution processes:

For a scalar instruction, it will be executed immediately by the Scalar Unit.
For other instructions, they are scheduled to five independent queues (Vector Queue, Cube Queue, and MTE1/MTE2/MTE3 Queues), and then allocated to an execution unit for execution.

Instructions in the same queue are executed according to their enqueue sequence, and instructions in different queues can be executed in parallel. Such parallelism improves overall execution efficiency. For data dependency that may occur during parallel execution, the Event Sync module inserts synchronization instructions to control pipeline synchronization. The PipeBarrier and SetFlag/WaitFlag instructions are provided to ensure that the instructions inside a queue and across queues are executed based on the logical relationship.

PipeBarrier synchronizes the instructions in the same queue. Instructions after the barrier cannot issue until all instructions before the barrier are committed.
SetFlag and WaitFlag are a pair of inter-queue synchronization instructions.
- SetFlag: The current instruction starts to be executed after all read and write operations of the current instruction are completed and the corresponding flag bit in hardware is set to 1.
- WaitFlag: When this instruction is executed, if the corresponding flag bit is 0, the subsequent instructions in the queue are blocked; if the corresponding flag bit is 1, it is changed to 0, and subsequent instructions are executed.

Ascend C provides APIs for synchronization control. You can use this type of APIs to implement synchronization control. Generally, there is no need to consider synchronization when programming based on the programming model and paradigm described in Programming Model. The programming model implements synchronization control. Using the programming model and paradigm is recommended. Manual synchronization control may complicate programming.

However, we still hope that you can understand the basic principles of synchronization to better understand and design parallel computing programs. In a few cases, you need to manually insert synchronization. For details, see When Do I Need to Manually Insert Synchronization.

Parent topic: Hardware Architecture