Programming Model Design Principles

In the Ascend C programming model, the core elements of the parallel programming paradigm are a group of parallel computing tasks, synchronization between tasks through queues, and scheduling of parallel computing tasks and resources by developers. This section describes the design principles of the programming model, helping you better understand the design and advantages of the programming model and facilitating subsequent in-depth development.

The programming paradigm of each parallel task stage is as follows:

  1. Obtain available memory from the local memory: call AllocTensor to allocate memory, or call DeQue to deque a memory data slice from the upstream queue.
  2. Complete computation or data movement.
  3. Call EnQue to enqueue the data processed in the previous step.
  4. Call FreeTensor to free the memory that is no longer needed.

Take the simplest vector programming paradigm as an example. When the preceding APIs are called, some instructions are actually delivered to each execution unit, as shown in the following figure.

Figure 1 Vector programming paradigm instruction queue
  • Enque/Deque process:
    1. The Scalar Unit reads the operator instruction sequence.
    2. These instructions are sent to the instruction queue of the corresponding execution unit.
    3. Execution units execute these instructions in parallel.
    4. Enque/Deque solves the read-after-write problem of the memory.
      • When Enque is called, the synchronization instruction set is sent, and a signal is sent to activate wait.
      • When Deque is called, the synchronization instruction wait is sent to wait until the data write is finished.
      • The wait instruction can be executed only after the set instruction is executed. Otherwise, the instruction is blocked.

    Enque/Deque mainly solves the problem of synchronous control of read-after-write of parallel execution units when data dependency exists.

  • AllocTensor/FreeTensor process:
    1. The Scalar Unit reads the operator instruction sequence.
    2. These instructions are sent to the instruction queue of the corresponding execution unit.
    3. Execution units execute these instructions in parallel.
    4. AllocTensor/FreeTensor solves the memory write-after-read problem.
      • When AllocTensor is called, a synchronization instruction wait is sent to wait until the memory is read completely.
      • When FreeTensor is called, the synchronization instruction set is sent to notify the system to free the memory for rewrite.
      • The wait instruction can be executed only after the set instruction is executed. Otherwise, the instruction is blocked.

    AllocTensor/FreeTensor mainly solves the problem of synchronous control of write-after-read of parallel execution units when data dependency exists.

In conclusion, complex synchronization control needs to be considered for asynchronous parallel programs. The Ascend C programming model encapsulates these processes and uses Enque/Deque/AllocTensor/FreeTensor to simplify programming and facilitate understanding.