IBWait
Function Usage
When different AI Cores operate the same global memory block, this function can be called to synchronize the AI Cores to avoid data dependency problems such as write-after-read, read-after-write, and write-after-write. IBWait and IBSet are used in pairs to indicate the synchronous waiting instruction between cores, waiting for the completion of a core operation.
Prototype
1 2 | template<bool isAIVOnly = true> __aicore__ inline void IBWait(const GlobalTensor<int32_t>& gmWorkspace, const LocalTensor<int32_t>& ubWorkspace, int32_t blockIdx, int32_t eventID) |
Parameters
Parameter |
Input/Output |
Description |
|---|---|---|
gmWorkspace |
Output |
Public buffer for storing the external core status. The type is GlobalTensor. For details about the definition of the GlobalTensor data structure, see GlobalTensor. |
ubWorkspace |
Input |
Public buffer of the current core. The type is LocalTensor. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT. |
blockIdx |
Input |
idx of the waiting core. The value ranges from 0 to the number of cores minus 1, excluding the blockIdx of the waiting core. |
eventID |
Input |
Controls the set and wait events of the current core. |
isAIVOnly |
Input |
Indicates whether the AIVOnly mode is used. The default value is true. |
Returns
None
Availability
Constraints
- The minimum space allocated for gmWorkspace is as follows: Number of cores * 32 bytes * eventID_max + blockIdx_max * 32 bytes + 32 bytes. (eventID_max and blockIdx_max indicate the maximum values of eventID and blockIdx, respectively.)
- The minimum size of ubWorkspace is 32 bytes.
- When this API is used for multi-core control, the logical blockDim specified during operator calling must be less than or equal to the number of cores for running the operator. Otherwise, the framework inserts abnormal synchronization during multi-round scheduling, causing the kernel to stop responding.
Example
For details about the calling examples, see Example.