Enabling Asynchronous Iterate APIs to Avoid AIC/AIV Synchronization Dependency
[Priority] High
[Description] In hybrid programming of AI Cube (AIC) and AI Vector (AIV), when Matmul Iterate or IterateAll is called, AIV sends a message to AIC to start Matmul computation. In Iterate<sync=true> synchronous mode, as shown in Figure 1, each invoking triggers a message sending. In Iterate<sync=false> asynchronous mode, for example, Figure 2, a message needs to be sent only for the first time, and no message needs to be sent subsequently. This reduces the interaction between Cube and Vector cores and the inter-core communication overhead. Therefore, the Iterate<false> or IterateAll<false> asynchronous API is recommended in the mix scenario.
[Negative Example]
TQueBind<TPosition::CO2, TPosition::VECIN> qVecIn;
TQueBind<TPosition::VECIN, TPosition::VECOUT> qVecOut;
mm.SetTensorA(gmA);
mm.SetTensorB(gmB);
int16_t scalar = 2;
while(mm.template Iterate()){
auto cInUB = qVecIn.AllocTensor<float>();
mm.GetTensorC(cInUB);
qVecIn.EnQue(cInUB);
cInUB = qVecIn.Deque<float>();
auto cOutUB = qVecOut.AllocTensor<float>();
Muls(cOutUB, cInUB, scalar, baseM*baseN);
qVecIn.FreeTensor(cInUB);
...
}
[Positive Example]
TQueBind<TPosition::CO2, TPosition::VECIN> qVecIn;
TQueBind<TPosition::VECIN, TPosition::VECOUT> qVecOut;
mm.SetTensorA(gmA);
mm.SetTensorB(gmB);
mm.SetWorkspace(workspace, size);// workspace indicates the physical address of the temporary space, and size indicates the size of the memory occupied by matrix C, being singleCoreM*singleCoreN: singleCoreM*singleCoreN*sizeof(float).
int16_t scalar = 2;
while(mm.template Iterate<false>()){
auto cInUB = qVecIn.AllocTensor<float>();
mm.GetTensorC(cInUB);
qVecIn.EnQue(cInUB);
cInUB = qVecIn.Deque<float>();
auto cOutUB = qVecOut.AllocTensor<float>();
Muls(cOutUB, cInUB, scalar, baseM*baseN);
qVecIn.FreeTensor(cInUB);
...
}
Parent topic: Pipeline Optimization

