PipeBarrier(ISASI)

Supported Products

Product

Supported/Unsupported

Atlas A3 training products/Atlas A3 inference products

Atlas A2 training products/Atlas A2 inference products

Atlas 200I/500 A2 inference products

Atlas inference product's AI Core

Atlas inference product's Vector Core

x

Atlas training products

Function Usage

Blocks a pipeline. This synchronization operation needs to be inserted between the same pipelines with data dependency.

Prototype

1
2
template <pipe_t pipe>
__aicore__ inline void PipeBarrier()

Parameters

Table 1 Parameters in the template

Parameter

Description

pipe

Template parameter, indicating the type of a blocked pipeline.

For details about supported pipelines, see Pipelines.

If you do not care about the pipeline type and want to block all pipelines, you can pass PIPE_ALL.

Returns

None

Constraints

The synchronization between Scalar pipelines is automatically ensured by the hardware. Calling PipeBarrier<PIPE_S>() throws a hardware error.

Example

In the following example, the input dst0Local of the Mul instruction is the output of the Add instruction. The two vector operation instructions depend on each other. Therefore, PipeBarrier needs to be inserted to ensure the execution sequence of the two instructions.

Note: This is only an example for reference. When automatic synchronization is enabled (enabled by default in the kernel direct scheduling operator project and custom operator development project), the compiler automatically inserts PIPE_V synchronization, and you do not need to manually insert it.

Figure 1 The Mul instruction and the Add instruction are in a serial relationship. The Mul instruction can be executed only after the Add instruction is executed.
1
2
3
4
5
6
7
8
9
AscendC::LocalTensor<half> src0Local;
AscendC::LocalTensor<half> src1Local;
AscendC::LocalTensor<half> src2Local;
AscendC::LocalTensor<half> dst0Local;
AscendC::LocalTensor<half> dst1Local;

AscendC::Add(dst0Local, src0Local, src1Local, 512);
AscendC::PipeBarrier<PIPE_V>();
AscendC::Mul(dst1Local, dst0Local, src2Local, 512);