IBWait

Function Usage

When different AI Cores operate the same global memory block, this function can be called to synchronize the AI Cores to avoid data dependency problems such as write-after-read, read-after-write, and write-after-write. IBWait and IBSet are used in pairs to indicate the synchronous waiting instruction between cores, waiting for the completion of a core operation.

Prototype

1
2
template<bool isAIVOnly = true>
__aicore__ inline void IBWait(const GlobalTensor<int32_t>& gmWorkspace, const LocalTensor<int32_t>& ubWorkspace, int32_t blockIdx, int32_t eventID)

Parameters

Table 1 Parameters

Parameter

Input/Output

Description

gmWorkspace

Output

Public buffer for storing the external core status. The type is GlobalTensor. For details about the definition of the GlobalTensor data structure, see GlobalTensor.

ubWorkspace

Input

Public buffer of the current core. The type is LocalTensor.

The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT.

blockIdx

Input

idx of the waiting core. The value ranges from 0 to the number of cores minus 1, excluding the blockIdx of the waiting core.

eventID

Input

Controls the set and wait events of the current core.

isAIVOnly

Input

Indicates whether the AIVOnly mode is used. The default value is true.

Returns

None

Availability

Atlas Training Series Product

Constraints

  • The minimum space allocated for gmWorkspace is as follows: Number of cores * 32 bytes * eventID_max + blockIdx_max * 32 bytes + 32 bytes. (eventID_max and blockIdx_max indicate the maximum values of eventID and blockIdx, respectively.)
  • The minimum size of ubWorkspace is 32 bytes.
  • When this API is used for multi-core control, the logical blockDim specified during operator calling must be less than or equal to the number of cores for running the operator. Otherwise, the framework inserts abnormal synchronization during multi-round scheduling, causing the kernel to stop responding.

Example

For details about the calling examples, see Example.