InterHcclGroupSync

Applicability

Product	Supported
Atlas A3 training products/Atlas A3 inference products	√
Atlas A2 training products/Atlas A2 inference products	x
Atlas 200I/500 A2 inference products	x
Atlas inference product's AI Core	x
Atlas inference product's Vector Core	x
Atlas training products	x

Function

Waits till a cross-communicator communication task is complete. After this API is called, subsequent communication tasks delivered by the local communicator are executed only after the srcHandleID communication task in the specified srcGroupID communicator is complete.

Prototype

__aicore__ inline void InterHcclGroupSync(int8_t srcGroupID, HcclHandle srcHandleID);

Parameters

Parameter	Input/Output	Description
srcGroupID	Input	Communicator ID. It is the ID of the communicator from which the waited communication task is delivered.
srcHandleID	Input	Communication task ID. It is the identifier HcclHandle of the waited communication task.

Returns

None

Restrictions

Before calling this API, ensure that the InitV2 and SetCcTilingV2 APIs have been called.
When this API is called on the AIC or AIV core, the calling core must be the same as that of the corresponding Prepare API.
The total number of times that all Prepare and InterHcclGroupSync APIs are called in a communicator cannot exceed 63.

Example

A communication fused operator is constructed. The operator has one input xGM and two outputs alltoallGM and allgatherGM. In addition, the operator has two communicators. The communicator 0 performs AlltoAll communication on the input and outputs the result to alltoallGM. Then, communicator 1 uses the result as an input of AllGather communication, and outputs the communication result to allgatherGM.

extern "C" __global__ __aicore__ void alltoall_allgather_custom(GM_ADDR xGM, GM_ADDR alltoallGM, GM_ADDR allgatherGM) {
    REGISTER_TILING_DEFAULT(AlltoAllAllGatherCustomTilingData); // AlltoAllAllGatherCustomTilingData is a structure defined in the operator header file.
    GET_TILING_DATA_WITH_STRUCT(AlltoAllAllGatherCustomTilingData, tilingData, tilingGM);
    GM_ADDR contextGM0 = AscendC::GetHcclContext<0>();
    GM_ADDR contextGM1 = AscendC::GetHcclContext<1>();

    Hccl hccl0;
    Hccl hccl1;
    HcclDataType dtype = HcclDataType::HCCL_DATA_TYPE_FP16;
    const uint64_t dataCount = 10U;
    const uint64_t strideCount = 0U;
    const uint64_t rankNum = 4U;
    if (AscendC::g_coreType == AIV) {  // Use only the AIV Core for communication.
	hccl0.InitV2(contextGM0, &tilingData);
        hccl1.InitV2(contextGM1, &tilingData);
        hccl0.SetCcTilingV2(offsetof(AlltoAllAllGatherCustomTilingData, alltoallTiling));
        hccl1.SetCcTilingV2(offsetof(AlltoAllAllGatherCustomTilingData, allgatherTiling));
		
	// Communicator 0 delivers one AlltoAll task.
        auto group0_handle = hccl0.AlltoAll(xGM, alltoallGM, dataCount, dtype, strideCount);
		
	// Communicator 1 delivers a cross-communicator dependent task to ensure that the subsequent AllGather task in communicator 1 is executed only after the AlltoAll task is complete in communicator 0.
	hccl1.InterHcclGroupSync(0, group0_handle);
        // Communicator 1 delivers a ReduceScatter task.
        HcclReduceOp op = HcclReduceOp::HCCL_REDUCE_SUM;
	auto group1_handle = hccl1.AllGather(alltoallGM, allgatherGM, dataCount, dtype, op, strideCount);
		
	hccl0.Commit(group0_handle);
        hccl1.Commit(group1_handle);
	hccl0.Wait(group0_handle);
	hccl1.Wait(group1_handle);
		
	AscendC::SyncAll<true>();  // All AIV Cores are synchronized to prevent too fast execution on core 0. Calling the hccl.Finalize() API prematurely can cause suspension of other cores during the Wait operation.
	hccl0.Finalize();
	hccl1.Finalize();
    } 
}

Parent topic: HCCL Kernel APIs