APIs
HCCL Python APIs are used to implement framework adaptation in graph mode. Currently, they are only used to implement distributed optimization of the TensorFlow network in the Ascend AI Processor. The distributed optimizers NPUDistributedOptimizer and npu_distributed_optimizer_wrapper provided by TF Adapter enable users to automatically complete gradient aggregation without sensing AllReduce, implementing data parallel training. In addition, to meet users' requirements for flexibility, HCCL provides various common APIs, such as those for rank management, gradient splitting, and collective communication prototype.
APIs
Table 1 lists the Python APIs provided by HCCL.
- The rank management APIs are defined in the api.py file in ${install_path}/python/site-packages/hccl/manage.
- The gradient splitting APIs are defined in the api.py file in ${install_path}/python/site-packages/hccl/split.
- The collective communication APIs are defined in the hccl_ops.py file in ${TFPLUGIN_INSTALL_PATH}/npu_bridge/hccl.
API |
Description |
|---|---|
Rank management |
|
Creates a user-defined group for collective communication. |
|
Destroys a user-defined group for collective communication. |
|
Obtains the number of ranks (that is, the number of devices) in a group. |
|
Obtains the number of local ranks on the server where the devices in the group are located. |
|
Obtains the rank ID of a device in a group. |
|
Obtains the local rank ID of a device in a group. |
|
Obtains the world rank ID based on the rank ID of the process in the group. |
|
Obtains the group rank ID of the process in the group using the world rank ID. |
|
Gradient splitting |
|
Sets a backward gradient splitting strategy in a collective communication group based on the gradient index ID to implement AllReduce fusion and optimize the collective communication performance. |
|
Sets a backward gradient splitting strategy in a collective communication group based on the proportion of gradient data to implement AllReduce fusion and optimize the collective communication performance. |
|
Collective communication |
|
Performs the reduction operation on the input data of all ranks in a group and sends the result to the output buffer of all ranks. The reduction operation type is specified by the reduction parameter. This API operates the collective communication operator AllReduce. |
|
Re-sorts the inputs of all ranks in the communicator by rank ID, combines the inputs, and sends the results to the outputs of all ranks. |
|
Broadcasts the data of the root rank in the communicator to other ranks. |
|
Performs the sum operation (or other reduction operations) on the inputs of all ranks, and then distributes the result evenly to the output buffers of ranks according to the rank IDs. Each process receives 1/ranksize portion of data from other processes for reduction. |
|
Performs the sum operation (or other reduction operations) on the data of all ranks and sends the result to the specified position on the root rank. |
|
Sends data (with the customized data size) to all ranks in the collective communicator and receives data from all ranks. |
|
Sends data (with the customized data size) to all ranks in the collective communicator and receives data from all ranks. alltoallvc passes the RX and TX parameters of all ranks through the argument send_count_matrix, which outperforms alltoallv. |
|
Point-to-point communication |
|
Sends data to a rank within a collective communication group. |
|
Receives data from a rank within a collective communication group. |
|
Concepts
Concept |
Description |
|---|---|
group |
Indicates the process groups that participate in collective communication. The process groups include:
|
rank |
A communication entity in the group. Each rank is assigned a unique ID ranging from 0 to n – 1 (n is the number of NPUs). |
rank size |
|
rank id |
|