Iterate

Function Description

Computes a matrix C of size baseM x baseN by each call to Iterate. The API maintains the iteration progress internally, and after each call, it will offset the initial addresses of matrices A and B. The default iteration sequence is the M axis first and then the N axis, but it can be changed to the N axis first and then the M axis by adjusting the tiling parameter iterateOrder.

If the input data is not aligned and remainders exist, the computation result of the remainders is output in the last iteration.

Prototype

1
template <bool sync = true> __aicore__ inline bool Iterate(bool enPartialSum = false)

Parameters

Table 1 Parameters in the template

Parameter

Description

sync

Thera are synchronous and asynchronous modes to iteratively obtaining the slices of matrix C. This parameter specifies the two modes: true for the synchronous mode and false for the asynchronous mode. The synchronous mode is used by default. For details about the modes and how to use them, see GetTensorC.

Table 2 Parameter description

Parameter

Input/Output

Description

enPartialSum

Input

Whether to accumulate the matrix multiplication result to the existing CO1 data. The default value is false. During L0C accumulation, the specification of matrix C output by multiplication of matrix A and matrix B can only be singleM==baseM &&singleN==baseN.

Returns

false: All data on a single core is computed.

true: Data is still in iterative computation.

Availability

Precautions

None

Example

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
// Synchronous mode
while (mm.Iterate()) {   
    mm.GetTensorC(ubCmatrix); 
}

// Asynchronous mode
mm.template Iterate<false>();
...... ......
for (int i = 0; i < singleM/baseM*singleN/baseN; ++i) {   
    mm.GetTensorC<false>(ubCmatrix); 
}