IterateAll
Applicability
Product |
Supported |
|---|---|
√ |
|
√ |
|
√ |
|
√ |
|
x |
|
x |
Function
Computes a matrix C of size singleCoreM x singleCoreN by each call to IterateAll. The iteration sequence can be adjusted using the tiling parameter iterateOrder.
Prototype
1 | template <bool sync = true> __aicore__ inline void IterateAll(const GlobalTensor<DstT>& gm, uint8_t enAtomic = 0, bool enSequentialWrite = false, bool waitIterateAll = false, bool fakeMsg = false) |
1 | template <bool sync = true> __aicore__ inline void IterateAll(const LocalTensor<DstT>& ubCmatrix, uint8_t enAtomic = 0) |
Parameters
Parameter |
Description |
|---|---|
sync |
Matrix C can be obtained in synchronous or asynchronous mode.
Setting it to true (default) enables the synchronous mode; while setting it to false enables the asynchronous mode. For the |
Parameter |
Input/Output |
Description |
|---|---|---|
gm |
Output |
Matrix C. The type is GlobalTensor. For For For the For |
ubCmatrix |
Output |
Matrix C. The type is LocalTensor, and TPosition can be set to TSCM. For the For the For the For |
enAtomic |
Input |
Enables the Atomic operation or not. Values: 0 (default): disables the Atomic operation. 1: enables the AtomicAdd (accumulation) operation. 2: enables the AtomicMax (maximum value calculation) operation. 3: enables the AtomicMin (minimum value calculation) operation. For the For the |
enSequentialWrite |
Input |
Enables the continuous write mode or not (write to [baseM,baseN] for continuous write and to [singleCoreM,singleCoreN] for discontinuous write). The default value is false (discontinuous write). For |
waitIterateAll |
Input |
Used only in asynchronous scenarios, indicating whether to use WaitIterateAll to wait for the completion of IterateAll execution. true: WaitIterateAll is used to wait for the completion of IterateAll execution. false: WaitIterateAll is not used to wait for the completion of IterateAll execution. Developers can handle this waiting process themselves. |
fakeMsg |
Input |
This parameter is used only in the IBShare scenario (doIBShareNorm is enabled in template parameters) and IntraBlockPartSum scenario (intraBlockPartSum is enabled in template parameters).
|
Returns
None
Restrictions
Ensure that the size of the address space of the input matrix C is greater than or equal to the number of elements in singleCoreM × singleCoreN.
Example
The following is an example of calling the IterateAll API. For more operator examples in asynchronous scenarios, see matrix multiplication in IterateAll asynchronous scenarios.
1 2 3 4 5 | REGIST_MATMUL_OBJ(&pipe, GetSysWorkSpacePtr(), mm, &tiling); mm.SetTensorA(gm_a); mm.SetTensorB(gm_b); mm.SetBias(gm_bias); mm.IterateAll(gm_c); // Computation |