APIs

HCCL Python APIs are used to implement framework adaptation in graph mode. Currently, they are only used to implement distributed optimization of the TensorFlow network in the Ascend AI Processor. The distributed optimizers NPUDistributedOptimizer and npu_distributed_optimizer_wrapper provided by TF Adapter enable users to automatically complete gradient aggregation without sensing AllReduce, implementing data parallel training. In addition, to meet users' requirements for flexibility, HCCL provides various common APIs, such as those for rank management, gradient splitting, and collective communication prototype.

APIs

Table 1 lists the Python APIs provided by HCCL.

The rank management APIs are defined in the api.py file in ${install_path}/python/site-packages/hccl/manage.
The gradient splitting APIs are defined in the api.py file in ${install_path}/python/site-packages/hccl/split.
The collective communication APIs are defined in the hccl_ops.py file in ${TFPLUGIN_INSTALL_PATH}/npu_bridge/hccl.

**Table 1** HCCL (Python) API list
API	Description
Rank management
create_group	Creates a user-defined group for collective communication.
destroy_group	Destroys a user-defined group for collective communication.
get_rank_size	Obtains the number of ranks (that is, the number of devices) in a group.
get_local_rank_size	Obtains the number of local ranks on the server where the devices in the group are located.
get_rank_id	Obtains the rank ID of a device in a group.
get_local_rank_id	Obtains the local rank ID of a device in a group.
get_world_rank_from_group_rank	Obtains the world rank ID based on the rank ID of the process in the group.
get_group_rank_from_world_rank	Obtains the group rank ID of the process in the group using the world rank ID.
Gradient splitting
set_split_strategy_by_idx	Sets a backward gradient splitting strategy in a collective communication group based on the gradient index ID to implement AllReduce fusion and optimize the collective communication performance.
set_split_strategy_by_size	Sets a backward gradient splitting strategy in a collective communication group based on the proportion of gradient data to implement AllReduce fusion and optimize the collective communication performance.
Collective communication
allreduce	Performs the reduction operation on the input data of all ranks in a group and sends the result to the output buffer of all ranks. The reduction operation type is specified by the reduction parameter. This API operates the collective communication operator AllReduce.
allgather	Re-sorts the inputs of all ranks in the communicator by rank ID, combines the inputs, and sends the results to the outputs of all ranks.
broadcast	Broadcasts the data of the root rank in the communicator to other ranks.
reduce_scatter	Performs the sum operation (or other reduction operations) on the inputs of all ranks, and then distributes the result evenly to the output buffers of ranks according to the rank IDs. Each process receives 1/ranksize portion of data from other processes for reduction.
reduce	Performs the sum operation (or other reduction operations) on the data of all ranks and sends the result to the specified position on the root rank.
alltoallv	Sends data (with the customized data size) to all ranks in the collective communicator and receives data from all ranks.
alltoallvc	Sends data (with the customized data size) to all ranks in the collective communicator and receives data from all ranks. alltoallvc passes the RX and TX parameters of all ranks through the argument send_count_matrix, which outperforms alltoallv.
Point-to-point communication
send	Sends data to a rank within a collective communication group.
receive	Receives data from a rank within a collective communication group.

Concepts

Concept	Description
group	Indicates the process groups that participate in collective communication. The process groups include: hccl_world_group: default global group, including all ranks that participate in collective communication. This group is created using the ranktable file. User-defined group: a subset of the process groups contained in the hccl_world_group. The ranks in the ranktable file can be defined as different groups through the create_group API, and the collective communication algorithms can be executed in parallel.
rank	A communication entity in the group. Each rank is assigned a unique ID ranging from 0 to n – 1 (n is the number of NPUs).
rank size	Rank size: indicates the number of ranks in a group. Local rank size: indicates the number of ranks in a group on the server where the processes are located.
rank id	Rank ID: indicates the ID of a process in a group. The value ranges from 0 to (rank size – 1). For a user-defined group, the rank starts from 0 in the group. For hccl_world_group, the rank ID is the same as the world rank ID. World rank ID: indicates the rank ID of a process in hccl_world_group. The value ranges from 0 to (rank size – 1). Local rank ID: indicates the rank ID of a process in a group on the server where the process is located. The value ranges from 0 to (local rank size – 1).

Parent topic: HCCL APIs (Python)