Enabling Asynchronous the Iterate or IterateAll API to Avoid AIC/AIV Synchronization Dependency
[Priority] High
[Description] In hybrid programming of AI Cube (AIC) and AI Vector (AIV), when Matmul Iterate or IterateAll is called, AIV sends a message to AIC to start Matmul compute. In Iterate<true> synchronous mode, as shown in Figure 1, each call triggers a message sending. In Iterate<true> asynchronous mode, as shown in Figure 2, a message needs to be sent only for the first time, and no message needs to be sent subsequently. This reduces the interaction between AICs and AIVs and the inter-core communication overhead. Therefore, the asynchronous Iterate<false>() or IterateAll<false>() API is recommended in hybrid programming. (Note: When using the asynchronous API, you need to set the workspace.)
[Negative Example]
The synchronous Iterate API is used in hybrid programming.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | TQueBind<TPosition::CO2, TPosition::VECIN> qVecIn; TQueBind<TPosition::VECIN, TPosition::VECOUT> qVecOut; mm.SetTensorA(gmA); mm.SetTensorB(gmB); int16_t scalar = 2; while(mm.template Iterate()){ auto cInUB = qVecIn.AllocTensor<float>(); mm.GetTensorC(cInUB); qVecIn.EnQue(cInUB); cInUB = qVecIn.DeQue<float>(); auto cOutUB = qVecOut.AllocTensor<float>(); Muls(cOutUB, cInUB, scalar, baseM*baseN); qVecIn.FreeTensor(cInUB); ... } |
[Positive Example]
The asynchronous Iterate API is used in hybrid programming.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | TQueBind<TPosition::CO2, TPosition::VECIN> qVecIn; TQueBind<TPosition::VECIN, TPosition::VECOUT> qVecOut; mm.SetTensorA(gmA); mm.SetTensorB(gmB); mm.SetWorkspace(workspace, size);// workspace indicates the physical address of the temporary space, and size indicates the size of the memory occupied by matrix C, being singleCoreM*singleCoreN: singleCoreM*singleCoreN*sizeof(float). int16_t scalar = 2; while(mm.template Iterate<false>()){ auto cInUB = qVecIn.AllocTensor<float>(); mm.GetTensorC(cInUB); qVecIn.EnQue(cInUB); cInUB = qVecIn.DeQue<float>(); auto cOutUB = qVecOut.AllocTensor<float>(); Muls(cOutUB, cInUB, scalar, baseM*baseN); qVecIn.FreeTensor(cInUB); ... } |

