IBWait

Supported Products

Product

Supported/Unsupported

Atlas A3 training products/Atlas A3 inference products

Atlas A2 training products/Atlas A2 inference products

Atlas 200I/500 A2 inference products

x

Atlas inference product's AI Core

Atlas inference product's Vector Core

x

Atlas training products

Function Usage

When different AI Cores operate the same global memory block, this function can be called to synchronize the AI Cores to avoid data dependency problems such as write-after-read, read-after-write, and write-after-write. IBWait and IBSet are used in pairs to indicate the synchronous waiting instruction between cores, waiting for the completion of a core operation.

Prototype

1
2
template <bool isAIVOnly = true>
__aicore__ inline void IBWait(const GlobalTensor<int32_t>& gmWorkspace, const LocalTensor<int32_t>& ubWorkspace, int32_t blockIdx, int32_t eventID)

Parameters

Table 1 Parameters in the template

Parameter

Description

isAIVOnly

Indicates whether the AIVOnly mode is used. The default value is true.

Table 2 Parameters

Parameter

Input/Output

Description

gmWorkspace

Output

Public buffer for storing the external core status. The type is GlobalTensor. For details about the definition of the GlobalTensor data structure, see GlobalTensor.

ubWorkspace

Input

Public cache of the current core.

The type is LocalTensor, and the supported TPosition is VECIN/VECCALC/VECOUT.

blockIdx

Input

ID of the waiting core. The value range is [0, Number of cores – 1], excluding the blockIdx of the current core.

eventID

Input

Controls the set and wait events of the current core.

Returns

None

Constraints

  • The minimum space allocated for gmWorkspace is as follows: Number of cores * 32 bytes * eventID_max + blockIdx_max * 32 bytes + 32 bytes. (eventID_max and blockIdx_max indicate the maximum values of eventID and blockIdx, respectively.)
  • The minimum size of ubWorkspace is 32 bytes.
  • When this API is used for multi-core control, the logical blockDim specified during operator calling must be less than or equal to the number of cores for running the operator. Otherwise, the framework inserts abnormal synchronization during multi-round scheduling, causing the kernel to stop responding.

Examples

For details about the calling examples, see Example.