InterHcclGroupSync

Applicability

Product

Supported

Atlas A3 training products/Atlas A3 inference products

Atlas A2 training products/Atlas A2 inference products

x

Atlas 200I/500 A2 inference products

x

Atlas inference product's AI Core

x

Atlas inference product's Vector Core

x

Atlas training products

x

Function

Waits till a cross-communicator communication task is complete. After this API is called, subsequent communication tasks delivered by the local communicator are executed only after the srcHandleID communication task in the specified srcGroupID communicator is complete.

Prototype

1
__aicore__ inline void InterHcclGroupSync(int8_t srcGroupID, HcclHandle srcHandleID);

Parameters

Parameter

Input/Output

Description

srcGroupID

Input

Communicator ID. It is the ID of the communicator from which the waited communication task is delivered.

srcHandleID

Input

Communication task ID. It is the identifier HcclHandle of the waited communication task.

Returns

None

Restrictions

  • Before calling this API, ensure that the InitV2 and SetCcTilingV2 APIs have been called.
  • When this API is called on the AIC or AIV core, the calling core must be the same as that of the corresponding Prepare API.
  • The total number of times that all Prepare and InterHcclGroupSync APIs are called in a communicator cannot exceed 63.

Example

A communication fused operator is constructed. The operator has one input xGM and two outputs alltoallGM and allgatherGM. In addition, the operator has two communicators. The communicator 0 performs AlltoAll communication on the input and outputs the result to alltoallGM. Then, communicator 1 uses the result as an input of AllGather communication, and outputs the communication result to allgatherGM.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
extern "C" __global__ __aicore__ void alltoall_allgather_custom(GM_ADDR xGM, GM_ADDR alltoallGM, GM_ADDR allgatherGM) {
    REGISTER_TILING_DEFAULT(AlltoAllAllGatherCustomTilingData); // AlltoAllAllGatherCustomTilingData is a structure defined in the operator header file.
    GET_TILING_DATA_WITH_STRUCT(AlltoAllAllGatherCustomTilingData, tilingData, tilingGM);
    GM_ADDR contextGM0 = AscendC::GetHcclContext<0>();
    GM_ADDR contextGM1 = AscendC::GetHcclContext<1>();

    Hccl hccl0;
    Hccl hccl1;
    HcclDataType dtype = HcclDataType::HCCL_DATA_TYPE_FP16;
    const uint64_t dataCount = 10U;
    const uint64_t strideCount = 0U;
    const uint64_t rankNum = 4U;
    if (AscendC::g_coreType == AIV) {  // Use only the AIV Core for communication.
	hccl0.InitV2(contextGM0, &tilingData);
        hccl1.InitV2(contextGM1, &tilingData);
        hccl0.SetCcTilingV2(offsetof(AlltoAllAllGatherCustomTilingData, alltoallTiling));
        hccl1.SetCcTilingV2(offsetof(AlltoAllAllGatherCustomTilingData, allgatherTiling));
		
	// Communicator 0 delivers one AlltoAll task.
        auto group0_handle = hccl0.AlltoAll(xGM, alltoallGM, dataCount, dtype, strideCount);
		
	// Communicator 1 delivers a cross-communicator dependent task to ensure that the subsequent AllGather task in communicator 1 is executed only after the AlltoAll task is complete in communicator 0.
	hccl1.InterHcclGroupSync(0, group0_handle);
        // Communicator 1 delivers a ReduceScatter task.
        HcclReduceOp op = HcclReduceOp::HCCL_REDUCE_SUM;
	auto group1_handle = hccl1.AllGather(alltoallGM, allgatherGM, dataCount, dtype, op, strideCount);
		
	hccl0.Commit(group0_handle);
        hccl1.Commit(group1_handle);
	hccl0.Wait(group0_handle);
	hccl1.Wait(group1_handle);
		
	AscendC::SyncAll<true>();  // All AIV Cores are synchronized to prevent too fast execution on core 0. Calling the hccl.Finalize() API prematurely can cause suspension of other cores during the Wait operation.
	hccl0.Finalize();
	hccl1.Finalize();
    } 
}