IBWait

Product Support

Product	Supported
Atlas A3 training products/Atlas A3 inference products	√
Atlas A2 training products/Atlas A2 inference products	√
Atlas 200I/500 A2 inference products	x
Atlas inference product's AI Core	√
Atlas inference product's Vector Core	x
Atlas training products	√

Function

When different cores operate on the same global memory and data dependency issues such as read-after-write, write-after-read, or write-after-write may occur, call this function to insert synchronization statements to avoid data read/write errors caused by such data dependencies. IBWait and IBSet are used in pairs to indicate the synchronization waiting instruction between cores, waiting for the completion of a core operation.

Prototype

template <bool isAIVOnly = true>
__aicore__ inline void IBWait(const GlobalTensor<int32_t>& gmWorkspace, const LocalTensor<int32_t>& ubWorkspace, int32_t blockIdx, int32_t eventID)

Parameters

**Table 1** Template parameters
Parameter	Description
isAIVOnly	Indicates whether the AIVOnly mode is used. The default value is true.

**Table 2** Parameters
Parameter	Input/Output	Description
gmWorkspace	Output	Public buffer for storing the external core status. The type is GlobalTensor. For details about the definition of the GlobalTensor data structure, see GlobalTensor.
ubWorkspace	Input	Public buffer of the current core. The type is LocalTensor, and the supported TPosition is VECIN/VECCALC/VECOUT.
blockIdx	Input	IDX number of the waiting core. The value range is [0, Number of cores – 1], excluding its own blockIdx.
eventID	Input	Controls the set and wait events of the current core.

Returns

None

Restrictions

The minimum space allocated for gmWorkspace is as follows: Number of cores * 32 bytes * eventID_max + blockIdx_max * 32 bytes + 32 bytes. (eventID_max and blockIdx_max indicate the maximum values of eventID and blockIdx, respectively.)
The minimum size of ubWorkspace is 32 bytes.
When this API is used for multi-core control, the logical blockDim specified during operator calling must be less than or equal to the number of cores for running the operator. Otherwise, the framework inserts abnormal synchronization during multi-round scheduling, causing the kernel to stop responding.

Example

For details about the calling examples, see Example.

Parent topic: Inter-Core Synchronization