Overview

This section uses the Add operator as an example to help you quickly build an Ascend C vector operator program and learn the typical scenarios and processing methods of vector operator development. The following scenarios are involved:

  • Basic vector operators: Develop a simple Add vector operator.
  • Using TBuf: Use temporary space to store intermediate results of operator computation.
  • Multi-core tiling: The operator runs on multiple cores of the AI Processor, and amounts of data computed by all cores are the same and 32-byte aligned.
  • Tail block tiling: The operator runs on multiple cores of the AI Processor, and amounts of data computed by all cores are the same. Except the last data block (tail block), the data blocks on each core have the same amount of data. Each core needs to compute the tail block data.
  • Tail core tiling: The operator runs on multiple cores of the AI Processor, and data cannot be evenly allocated across cores. All cores are divided into multiple whole cores and multiple tail cores. Each whole core processes the same amount of data, and each tail core also processes the same amount.
  • Tail cores and tail blocks: The operator runs on multiple cores of the AI Processor. Data cannot be evenly allocated across cores, nor can it be evenly distributed within each core. Except the last data block (tail block), the data blocks on each core have the same amount of data. Each core needs to compute the tail block data separately.
  • DoubleBuffer scenario: Double buffering is enabled for parallel execution of multiple pipelines in the operator.
  • Broadcast scenario: Two inputs in the operator have different shapes. The shape of one input needs to be broadcast before computation.
  • Non-alignment scenario: There are multiple solutions for scenarios where data is not 32-byte aligned.

During data movement and vector computation, the length of the moved data and the start address of the operand must meet the following alignment requirements:

  • When the DataCopy API is used to move data, the length of the moved data and the start address of the operand (on Unified Buffer) must be 32-byte aligned.
  • Generally, during vector computation, the start address of the operand must be 32-byte aligned, and the basic unit for the computation is 32 bytes.