InterHcclGroupSync
Applicability
Product |
Supported |
|---|---|
√ |
|
x |
|
x |
|
x |
|
x |
|
x |
Function
Waits till a cross-communicator communication task is complete. After this API is called, subsequent communication tasks delivered by the local communicator are executed only after the srcHandleID communication task in the specified srcGroupID communicator is complete.
Prototype
1 | __aicore__ inline void InterHcclGroupSync(int8_t srcGroupID, HcclHandle srcHandleID); |
Parameters
Parameter |
Input/Output |
Description |
|---|---|---|
srcGroupID |
Input |
Communicator ID. It is the ID of the communicator from which the waited communication task is delivered. |
srcHandleID |
Input |
Communication task ID. It is the identifier HcclHandle of the waited communication task. |
Returns
None
Restrictions
- Before calling this API, ensure that the InitV2 and SetCcTilingV2 APIs have been called.
- When this API is called on the AIC or AIV core, the calling core must be the same as that of the corresponding Prepare API.
- The total number of times that all Prepare and InterHcclGroupSync APIs are called in a communicator cannot exceed 63.
Example
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | extern "C" __global__ __aicore__ void alltoall_allgather_custom(GM_ADDR xGM, GM_ADDR alltoallGM, GM_ADDR allgatherGM) { REGISTER_TILING_DEFAULT(AlltoAllAllGatherCustomTilingData); // AlltoAllAllGatherCustomTilingData is a structure defined in the operator header file. GET_TILING_DATA_WITH_STRUCT(AlltoAllAllGatherCustomTilingData, tilingData, tilingGM); GM_ADDR contextGM0 = AscendC::GetHcclContext<0>(); GM_ADDR contextGM1 = AscendC::GetHcclContext<1>(); Hccl hccl0; Hccl hccl1; HcclDataType dtype = HcclDataType::HCCL_DATA_TYPE_FP16; const uint64_t dataCount = 10U; const uint64_t strideCount = 0U; const uint64_t rankNum = 4U; if (AscendC::g_coreType == AIV) { // Use only the AIV Core for communication. hccl0.InitV2(contextGM0, &tilingData); hccl1.InitV2(contextGM1, &tilingData); hccl0.SetCcTilingV2(offsetof(AlltoAllAllGatherCustomTilingData, alltoallTiling)); hccl1.SetCcTilingV2(offsetof(AlltoAllAllGatherCustomTilingData, allgatherTiling)); // Communicator 0 delivers one AlltoAll task. auto group0_handle = hccl0.AlltoAll(xGM, alltoallGM, dataCount, dtype, strideCount); // Communicator 1 delivers a cross-communicator dependent task to ensure that the subsequent AllGather task in communicator 1 is executed only after the AlltoAll task is complete in communicator 0. hccl1.InterHcclGroupSync(0, group0_handle); // Communicator 1 delivers a ReduceScatter task. HcclReduceOp op = HcclReduceOp::HCCL_REDUCE_SUM; auto group1_handle = hccl1.AllGather(alltoallGM, allgatherGM, dataCount, dtype, op, strideCount); hccl0.Commit(group0_handle); hccl1.Commit(group1_handle); hccl0.Wait(group0_handle); hccl1.Wait(group1_handle); AscendC::SyncAll<true>(); // All AIV Cores are synchronized to prevent too fast execution on core 0. Calling the hccl.Finalize() API prematurely can cause suspension of other cores during the Wait operation. hccl0.Finalize(); hccl1.Finalize(); } } |