Iterate
Function Description
Computes a matrix C of size baseM x baseN by each call to Iterate. The API maintains the iteration progress internally, and after each call, it will offset the initial addresses of matrices A and B. The default iteration sequence is the M axis first and then the N axis, but it can be changed to the N axis first and then the M axis by adjusting the tiling parameter iterateOrder.
If the input data is not aligned and remainders exist, the computation result of the remainders is output in the last iteration.
Prototype
1 | template <bool sync = true> __aicore__ inline bool Iterate(bool enPartialSum = false) |
Parameters
Parameter |
Description |
|---|---|
sync |
Thera are synchronous and asynchronous modes to iteratively obtaining the slices of matrix C. This parameter specifies the two modes: true for the synchronous mode and false for the asynchronous mode. The synchronous mode is used by default. For details about the modes and how to use them, see GetTensorC. |
Parameter |
Input/Output |
Description |
|---|---|---|
enPartialSum |
Input |
Whether to accumulate the matrix multiplication result to the existing CO1 data. The default value is false. During L0C accumulation, the specification of matrix C output by multiplication of matrix A and matrix B can only be singleM==baseM &&singleN==baseN. |
Returns
false: All data on a single core is computed.
true: Data is still in iterative computation.
Availability
Precautions
None
Example
1 2 3 4 5 6 7 8 9 10 11 | // Synchronous mode while (mm.Iterate()) { mm.GetTensorC(ubCmatrix); } // Asynchronous mode mm.template Iterate<false>(); ...... ...... for (int i = 0; i < singleM/baseM*singleN/baseN; ++i) { mm.GetTensorC<false>(ubCmatrix); } |