CubeResGroupHandle Usage Description

CubeResGroupHandle is used to group AI Core compute resources in separated mode. After grouping, you can specify different compute tasks for different groups. An AI Core group can contain multiple AIVs and AICs between which a client/server architecture is used to process tasks. An AIV is a client, a Cube compute task is a message, and the AIV sends the message to the message queue. An AIC, as a server, traverses messages in the message queue and executes corresponding compute tasks based on the message type and content. One CubeResGroupHandle API can contain one or more AICs, and one AIC belongs to only one CubeResGroupHandle API. One AIV can belong to multiple CubeResGroupHandle APIs.

As shown in the following figure, CubeResGroupHandle1 contains two AICs (Block0 and Block1) and 10 AIVs. Block0 communicates with Queue0, Queue1, Queue2, Queue3, and Queue4. Block1 communicates with Queue5, Queue6, Queue7, Queue8, and Queue9. Each message queue corresponds to one AIV. The message queue depth is fixed at 4, that is, a maximum of four messages can be contained at a time. The number of message queues of CubeResGroupHandle2 is 12, indicating that there are 12 AIVs. The message processing sequence of CubeResGroupHandle is shown by the black arrows in CubeResGroupHandle2.

Figure 1 Group communication of AI Core compute resources based on CubeResGroupHandle

To group AI Core compute resources based on CubeResGroupHandle, perform the following steps:

  1. Create the compute object type required on the AIC.
  2. Create a communication area descriptor KfcWorkspace to record the address allocation of the communication message Msg.
  3. Customize a message structure for communication.
  4. Customize the callback computation structure and implement the Init and Call functions based on the actual service scenario.
  5. Create a CubeResGroupHandle object.
  6. Bind an AIV to the CubeResGroupHandle object.
  7. Send and receive messages.
  8. The AIV exits the message queue.

The following lists only code snippets. For details about a complete example, see CubeGroup sample.

  1. Create the compute object type required on the AIC.

    You can customize the compute object type required by the AIC or use the Matmul type provided by the advanced API as required. For example, create the Matmul types as follows. For details about A_TYPE, B_TYPE, C_TYPE, BIAS_TYPE, and CFG_NORM, see Matmul Template Parameters.

    1
    2
    // A_TYPE, B_TYPE, C_TYPE, BIAS_TYPE, and CFG_NORM are constructed based on the actual scenario.
    using MatmulApiType = MatmulImpl<A_TYPE, B_TYPE, C_TYPE, C_TYPE, CFG_NORM>;
    
  2. Create a KfcWorkspace descriptor.
    Use KfcWorkspace to manage the division of message communication areas of different CubeResGrouphandle objects.
    1
    2
    // Before creating a KfcWorkspace object, clear workspaceGM.
    KfcWorkspace desc(workspaceGM);
    
  3. Customize the message structure.
    You need to construct the CubeMsgBody message structure for the AIV to send communication messages to the AIC. The constructed CubeMsgBody must be 64-byte aligned. A 2-byte CubeGroupMsgHead must be defined at the beginning of the structure to ensure that the message sending and receiving mechanism works properly. For the definition of the CubeGroupMsgHead structure, see Table 2. Except the 2-byte CubeGroupMsgHead, other parameters are constructed based on service requirements.
    Table 1 CubeMsgBody message structure

    Parameter

    Description

    CubeMsgBody

    Custom message structure. The structure name can be customized, and the structure size must be 64-byte aligned.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    // The following is an example of a 64-byte aligned structure. You can construct the number and types of other parameters except CubeGroupMsgHead as required.
    struct CubeMsgBody {
       CubeGroupMsgHead head;  // It is 2-byte and must be placed at the beginning of the structure. In the custom CubeMsgBody, the variable name of CubeGroupMsgHead must be set to head. Otherwise, a compilation error will be reported.
       uint8_t funcID;
       uint8_t skipCnt;
       uint32_t value;
       bool isTransA;
       bool isTransB;
       bool isAtomic;
       bool isLast;                 
       int32_t tailM;              
       int32_t tailN;
       int32_t tailK;               
       uint64_t aAddr;
       uint64_t bAddr;
       uint64_t cAddr;
       uint64_t aGap;
       uint64_t bGap;
    }
    
    Table 2 Parameters in the CubeGroupMsgHead structure

    Parameter

    Description

    msgState

    Message state of the position. Values:

    • CubeMsgState::FREE: indicates that no message is filled in the position. You can call AllocMessage.
    • CubeMsgState::VALID: indicates that the position contains the message sent by the AIV and is to be executed by the AIC.
    • CubeMsgState::QUIT: indicates that the message in this position is to notify the AIC that an AIV is about to exit the process.
    • CubeMsgState::FAKE: indicates that the message in this position is a fake message. In the message merging scenario, an AIV that skips processing tasks needs to send a fake message. For details about the message merging scenario, see the description in PostFakeMsg.

    aivID

    Index of the AIV that sends the message.

  4. Customize the callback computation structure and implement the Init and Call functions based on the actual service scenario.
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    template<class MatmulApiCfg, class CubeMsgBody>
    struct NormalCallbackFuncs {
        __aicore__ inline static void Call(MatmulApiCfg &mm, __gm__ CubeMsgBody *rcvMsg, CubeResGroupHandle<CubeMsgBody> &handle){
          // The user implements the logic.
        };
    
        __aicore__ inline static void Init(NormalCallbackFuncs<MatmulApiCfg, CubeMsgBody> &foo, MatmulApiCfg &mm, GM_ADDR tilingGM){
           // The user implements the logic.
        };
       
    };
    

    For details about the template parameters of the compute logic structure, see Table 3.

    Table 3 Parameters in the template

    Parameter

    Description

    MatmulApiCfg

    Custom data type of the object required for AIC computation. For details, see Step 1. This template parameter is mandatory.

    CubeMsgBody

    Custom message structure. This template parameter is mandatory.

    The custom callback computation structure must contain the fixed Init and Call functions. The function prototype is as follows. For details about the parameters of the Init function, see Table 4. For details about the parameters of the Call function, see Table 5.

    1
    2
    3
    4
    // The parameters and names of this function are in fixed format. The function is implemented based on the service logic.
    __aicore__ inline static void Init(MyCallbackFunc<MatmulApiCfg, CubeMsgBody> &myCallBack, MatmulApiCfg &mm, GM_ADDR tilingGM){
         // The user implements the internal logic.
    }
    
    Table 4 Init function parameters

    Parameter

    Input/Output

    Description

    myCallBack

    Input

    Custom callback compute structure with template parameters.

    mm

    Input

    Compute object on the AIC, which is usually a Matmul object.

    tilingGM

    Input

    Tiling pointer passed by the user.

    1
    2
    3
    4
    // The parameters and names of this function are in fixed format. The function is implemented based on the service logic.
    __aicore__ inline static void Call(MatmulApiCfg &mm, __gm__ CubeMsgBody *rcvMsg, CubeResGroupHandle<CubeMsgBody> &handle){
            // The user implements the internal logic.
    }
    
    Table 5 Call function parameters

    Parameter

    Input/Output

    Description

    mm

    Input

    Compute object on the AIC, which is usually a Matmul object.

    rcvMsg

    Input

    Pointer to the custom message structure.

    handle

    Input

    Handle for managing group messages. You can call this API to send, receive, and release messages.

    The following is a code example of the callback computation structure of an operator:
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    // Custom callback compute logic
    template<class MatmulApiCfg, typename CubeMsgBody>
    struct MyCallbackFunc
    {
        template<int32_t funcId>
        __aicore__ inline static typename IsEqual<funcId, 0>::Type CubeGroupCallBack(MatmulApiCfg &mm, __gm__ CubeMsgBody *rcvMsg, CubeResGroupHandle<CubeMsgBody> &handle)
        {
            GlobalTensor<int64_t> msgGlobal;
            msgGlobal.SetGlobalBuffer(reinterpret_cast<__gm__ int64_t *> (rcvMsg) + sizeof(int64_t));
            DataCacheCleanAndInvalid<int64_t, CacheLine::SINGLE_CACHE_LINE, DcciDst::CACHELINE_OUT> (msgGlobal);
            using SrcAT = typename MatmulApiCfg::AType::T;
            auto skipNum = 0;
            for (int i = 0; i < skipNum + 1; ++i)
            {
                auto tmpId = handle.FreeMessage(rcvMsg + i); // msgPtr process is complete
            }
            handle.SetSkipMsg(skipNum);
        }
        template<int32_t funcId>
        __aicore__ inline static typename IsEqual<funcId, 1>::Type CubeGroupCallBack(MatmulApiCfg &mm, __gm__ CubeMsgBody *rcvMsg, CubeResGroupHandle<CubeMsgBody> &handle)
        {
            GlobalTensor<int64_t> msgGlobal;
            msgGlobal.SetGlobalBuffer(reinterpret_cast<__gm__ int64_t *> (rcvMsg) + sizeof(int64_t));
            DataCacheCleanAndInvalid<int64_t, CacheLine::SINGLE_CACHE_LINE, DcciDst::CACHELINE_OUT> (msgGlobal);
            using SrcAT = typename MatmulApiCfg::AType::T;
            LocalTensor<SrcAT> tensor_temp;
            auto skipNum = 3;
            auto tmpId = handle.FreeMessage(rcvMsg, CubeMsgState::VALID);
            for (int i = 1; i < skipNum + 1; ++i)
            {
                auto tmpId = handle.FreeMessage(rcvMsg + i, CubeMsgState::FAKE);
            }
            handle.SetSkipMsg(skipNum); // notify the cube not to process
        }
        __aicore__ inline static void Call(MatmulApiCfg &mm, __gm__ CubeMsgBody *rcvMsg, CubeResGroupHandle<CubeMsgBody> &handle)
        {
            if (rcvMsg->funcId == 0)
            {
                CubeGroupCallBack<0> (mm, rcvMsg, handle);
            }
            else if(rcvMsg->funcId == 1)
            {
                CubeGroupCallBack<1> (mm, rcvMsg, handle);
            }
        }
        __aicore__ inline static void Init(MyCallbackFunc<MatmulApiCfg, CubeMsgBody> &foo, MatmulApiCfg &mm, GM_ADDR tilingGM)
        {
            auto tempTilingGM = (__gm__ uint32_t*)tilingGM;
            auto tempTiling = (uint32_t*)&(foo.tiling);
            for (int i = 0; i < sizeof(TCubeTiling) / sizeof(int32_t); ++i, ++tempTilingGM, ++tempTiling)
            {
                *tempTiling = *tempTilingGM;
            }
            mm.SetSubBlockIdx(0);
            mm.Init(&foo.tiling, GetTPipePtr());
        }
        TCubeTiling tiling;
    };
    
  5. Create a CubeResGroupHandle object.
    Use CreateCubeResGroup to create one or more CubeResGroupHandle objects.
    1
    2
    3
    4
    5
    6
    7
    8
    9
    /* 
     * groupID is the custom group ID of CreateCubeResGroup.
     * MatmulApiType is the type of the computation object defined on the AIC.
     * MyCallbackFunc is the user-defined callback computation structure.
     * CubeMsgBody is the user-defined message structure.
     * desc is the initialized communicator description.
     * groupID is 1, blockStart is 0, blockSize is 12, msgQueueSize is 48, and tilingGm is a pointer that stores the tiling information required by the user on the AIC.
    */
    auto handle =  AscendC::CreateCubeResGroup<groupID, MatmulApiType, MyCallbackFunc, CubeMsgBody>(desc, 0, 12, 48, tilingGM);
    
  6. Bind an AIV to the CubeResGroupHandle object.
    Bind the AIV to the message queue index. Note that the value of queIdx is less than the total number of message queues managed by the CubeGroupHandle object, and each AIV must pass in a different queIdx. // handle is the CubeResGroupHandle object created by CreateCubeResGroup in Step 5.
    1
    handle.AssignQueue(queIdx);
    
  7. The AIV sends a message.
    The user calls the AllocMessage and PostMessage APIs to send and receive messages. AllocMessage is called to obtain the message structure pointer, PostMessage is called to send messages, and PostFakeMessage is called to send fake messages in the message merging scenario. The following is an example:
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    CubeGroupMsgHead head = {CubeMsgState::VALID, (uint8_t)queIdx};
    CubeMsgBody aCubeMsgBody {head, 0, 0, 0, false, false, false, false, 0, 0, 0, 0, 0, 0, 0, 0};
    CubeMsgBody bCubeMsgBody {head, 1, 0, 0, false, false, false, false, 0, 0, 0, 0, 0, 0, 0, 0};
    auto offset = 0;
    if (GetBlockIdx() == 0)
    {
        auto msgPtr = handle.template AllocMessage(); // alloc for queue space
        offset = handle.template PostMessage(msgPtr, bCubeMsgBody); // post true msgPtr
        bool waitState = handle.template Wait<true> (offset); // wait until the msgPtr is proscessed
    }
    else if (GetBlockIdx() < 4)
    {
        auto msgPtr = handle.AllocMessage();
        offset = handle.PostFakeMsg(msgPtr); // post fake msgPtr
        bool waitState = handle.template Wait<true> (offset); // wait until the msgPtr is proscessed
    }
    else
    {
        auto msgPtr = handle.template AllocMessage();
        offset = handle.template PostMessage(msgPtr, aCubeMsgBody);
        bool waitState = handle.template Wait<true> (offset); // wait until the msgPtr is proscessed
    }
    
  8. The AIV exits the message queue.
    After the message structure pointer is obtained by calling AllocMessage, SendQuitMsg is called to send the message to exit the current message queue.
    1
    2
    auto msgPtr = handle.AllocMessage();        // Obtain the message space pointer msgPtr.
    handle.SetQuit(msgPtr);              // Send an exit message.