Asynchronous Scenario

The Iterate and IterateAll APIs of Matmul provide synchronous and asynchronous modes.

In synchronous mode, the next operation can be performed only after the preceding operation finishes. In asynchronous mode, the next operation can be performed without waiting for the preceding operation to finish.

Synchronous and Asynchronous Iterate and GetTensorC

Synchronous mode: GetTensorC is called to move matrix C tiles after an iteration. Then the next compute is performed after the data movement finishes.

As shown in the following figure, in matrix C, matrix block 2 is computed only after matrix block 1 is moved out, and matrix block 3 is computed only after matrix block 2 is moved out.

Sample code for the synchronous mode:

        
             while (mm. Iterate ()) {
    mm. GetTensorC (gm_c);
}

Asynchronous mode: The asynchronous mode can be enabled by setting template parameters. After Iterate is called, GetTensorC does not need to be called immediately for synchronization. Other operations can be executed first, and GetTensorC can be called when the result needs to be obtained. The asynchronous mode can reduce the synchronization time and improve the parallelism degree. This mode is ideal for scenarios with high requirements on computing performance. In the asynchronous scenario, reserve a temporary space to cache the Iterate compute result. Otherwise the compute result will be overwritten. When GetTensorC is called, tiles of matrix C are obtained from the temporary space. The temporary space is set by calling SetWorkspace. Call SetWorkspace before Iterate. For details about the complete example of the asynchronous scenarios of Iterate and GetTensorC, see asynchronous scenario sample.

        
             mm.SetWorkspace(workspace, size); // workspace indicates the physical address of the temporary space, and size indicates the size of the memory occupied by matrix C with the size of singleCoreM x singleCoreN, that is, singleCoreM x singleCoreN x sizeof(cDataType).

// Asynchronous mode
mm.template Iterate<false>();
…… // Perform other operations.
for (int i = 0; i < singleCoreM/baseM*singleCoreN/baseN; ++i) {
    mm.GetTensorC<false> (gm_c);
}

Synchronous and Asynchronous IterateAll

Synchronous mode: Subsequent operations can be performed until the IterateAll execution finishes.

        
             mm.SetTensorA(gm_a);    // Set the left matrix A.
mm.SetTensorB(gm_b);    // Set the right matrix B.
mm.SetBias(gm_bias);    // Set the bias.
mm.IterateAll(gm_c);
// Follow-up operations
...

Asynchronous mode: Subsequent operations do not need to wait for the completion of IterateAll. If the result of IterateAll is required, call WaitIterateAll to wait for the result returned by the asynchronous IterateAll API.

        
             matmul::Matmul<aType, bType, cType, biasType> mm;
mm.SetTensorA(queryGm[tensorACoreOffset]);
mm.SetTensorB(keyGm[tensorBCoreOffset + sInnerStart * singleProcessSInnerSize *
      tilingData->attentionScoreOffestStrideParams.matmulHead], true);
mm.SetTail(singleProcessSOuterSize, mmNNum);
mm.template IterateAll<false>(workspaceGm[tmp_block_idx * mmResUbSize * sInnerLoopTimes],false,true);
// do some others compute
mm.WaitIterateAll(); // Wait for IterateAll to complete.
DataCopy(dstUB, GM);  // Copy data from GM to UB.

Parent topic: Cube Programming (Advanced APIs)