HcclAlltoAllV

Description

Sends data (whose size can be customized) to all ranks in the communicator and receives data from all ranks.

Prototype

HcclResult HcclAlltoAllV(const void *sendBuf, const void *sendCounts, const void *sdispls, HcclDataType sendType,​ const void *recvBuf, const void *recvCounts, const void *rdispls, HcclDataType recvType,HcclComm comm, aclrtStream stream)

Parameters

Parameter

Input/Output

Description

sendBuf

Input

Address of the buffer to send source data.

sendCounts

Input

Amount of data to be sent, a uint64 array. sendCounts[i] = n indicates that the amount of data sent by the current rank to rank i is n.

For example, if sendType is set to float32 and sendCounts[i] is set to n, the current rank sends n pieces of float32 data to rank i.

sdispls

Input

Sending offset, a uint64 array. sdispls[i] = n indicates the offset of the start position of the data to be sent from the current rank to rank i relative to sendBuf. The basic unit is sendType.

sendType

Input

Data type of the data to be sent, which is of the HcclDataType type.

Atlas Training Series Product : The supported data types are int8, uint8, int16, uint16, int32, uint32, int64, uint64, float16, float32, and float64.

recvBuf

Output

Address of the buffer to receive collective communication result.

recvCounts

Input

Amount of data received, a uint64 array. recvCounts[i] = n indicates that the amount of data received by the current rank from rank i is n.

For example, if recvType is float32 and recvCounts[i] is n, the rank receives n pieces of float32 data from rank i.

rdispls

Input

Receiving offset, a uint64 array. rdispls[i] = n indicates the offset of the start position where the data received by the current rank from rank i is stored relative to recvBuf. The basic unit is recvType.

recvType

Input

Data type of the data to be received, which is of the HcclDataType type.

Atlas Training Series Product : The supported data types are int8, uint8, int16, uint16, int32, uint32, int64, uint64, float16, float32, and float64.

comm

Input

Communicator where the operation is performed.

stream

Input

Stream of the rank.

Returns

HcclResult: HCCL_SUCCESS on success; else, failure.

Constraints

  • The performance of the AlltoAllV operation is related to the size of the buffer for storing shared data between NPUs. When the communication data size exceeds the buffer size, the performance deteriorates significantly. If the AlltoAllV communication data size in the service is large, you are advised to increase the buffer size appropriately by setting environment variable HCCL_BUFFSIZE to improve the communication performance.
  • For the Atlas Training Series Product , the AlltoAllV communicators must meet the following requirement:

    In a cluster network, the communicators of 1p and 2p in a single server must be in the same cluster (with devices 0–3 and devices 4–7 each belonging to a separate cluster). In the communicators of 4p and 8p in a single server and multiple servers, the ranks must be based on the clusters, and the selected clusters in servers must be consistent.

  • This API cannot be used in non-cluster scenarios.

Applicability

Atlas Training Series Product