Enabling Asynchronous Iterate APIs to Avoid AIC/AIV Synchronization Dependency

[Priority] High

[Description] In hybrid programming of AI Cube (AIC) and AI Vector (AIV), when Matmul Iterate or IterateAll is called, AIV sends a message to AIC to start Matmul computation. In Iterate<sync=true> synchronous mode, as shown in Figure 1, each invoking triggers a message sending. In Iterate<sync=false> asynchronous mode, for example, Figure 2, a message needs to be sent only for the first time, and no message needs to be sent subsequently. This reduces the interaction between Cube and Vector cores and the inter-core communication overhead. Therefore, the Iterate<false> or IterateAll<false> asynchronous API is recommended in the mix scenario.

Figure 1 Message sending in synchronous mode

Figure 2 Message sending in asynchronous mode

[Negative Example]

TQueBind<TPosition::CO2, TPosition::VECIN>  qVecIn;
TQueBind<TPosition::VECIN, TPosition::VECOUT>  qVecOut;
mm.SetTensorA(gmA);
mm.SetTensorB(gmB);
int16_t scalar = 2;

while(mm.template Iterate()){
    auto cInUB = qVecIn.AllocTensor<float>();
    mm.GetTensorC(cInUB);
    qVecIn.EnQue(cInUB);
    cInUB = qVecIn.Deque<float>();
    auto cOutUB = qVecOut.AllocTensor<float>();
    Muls(cOutUB, cInUB, scalar, baseM*baseN);
    qVecIn.FreeTensor(cInUB);
    ...
}

[Positive Example]

TQueBind<TPosition::CO2, TPosition::VECIN>  qVecIn;
TQueBind<TPosition::VECIN, TPosition::VECOUT>  qVecOut;
mm.SetTensorA(gmA);
mm.SetTensorB(gmB);
mm.SetWorkspace(workspace, size);// workspace indicates the physical address of the temporary space, and size indicates the size of the memory occupied by matrix C, being singleCoreM*singleCoreN: singleCoreM*singleCoreN*sizeof(float).
int16_t scalar = 2;

while(mm.template Iterate<false>()){
    auto cInUB = qVecIn.AllocTensor<float>();
    mm.GetTensorC(cInUB);
    qVecIn.EnQue(cInUB);
    cInUB = qVecIn.Deque<float>();
    auto cOutUB = qVecOut.AllocTensor<float>();
    Muls(cOutUB, cInUB, scalar, baseM*baseN);
    qVecIn.FreeTensor(cInUB);
    ...
}

Parent topic: Pipeline Optimization