APIs
Framework developers can use the C APIs provided by HCCL to adapt their frameworks in single-operator mode for distributed capabilities.
- You can view the API definition in CANN software installation directory/include/hccl/.
- hccl.h: API definition file of communicator management and communication operators. The corresponding library file is libhccl.so.
- For details about data types, see the hccl_types.h file in CANN software installation directory/include/hccl.
Category |
API |
Description |
|---|---|---|
Communicator management |
Initializes the HCCL communicator based on the ranktable file. |
|
Initializes the HCCL communicator with specific configurations based on the ranktable file. |
||
This API needs to be called only on the root rank before the HCCL initialization API HcclCommInitRootInfo or HcclCommInitRootInfoConfig is called, to generate the rank identification information (HcclRootInfo) of the root rank. |
||
Initializes the HCCL based on rootInfo to create an HCCL communicator. |
||
Initializes the HCCL based on rootInfo and creates an HCCL communicator with specific configurations. |
||
Initializes the configuration options of the communicator and set the configurable parameters to the default values (200 for hcclBufferSize and 0 for hcclDeterministic). |
||
In the single-server communication scenario, a process is used to create a communicator for multiple devices (one device corresponds to one thread). During the initialization of the communicator, devices[0] functions as the root rank to automatically collect cluster information. |
||
Destroys a specified HCCL communicator. |
||
Obtains the rank size of the current communicator. |
||
Obtains the rank ID of a device in a collective communicator. |
||
Sets collective communication. Currently, only the support status of deterministic computing can be configured. |
||
Obtains the configuration related to collective communication. |
||
Obtains the name of the communicator where the current collective communication operation is performed. |
||
Checks whether the current software version supports the initialization configuration of a communicator. |
||
Splits an existing global communicator into sub-communicators with specific configurations. |
||
Collective communication |
Adds the input data of all nodes in the communicator (or performs other reduction operations) and sends the result to the output buffer of all nodes. The reduction operation type is specified by the op parameter. |
|
Broadcasts the data of the root rank in the communicator to other ranks. |
||
Re-sorts the inputs of all ranks in the communicator by rank ID, combines the inputs, and sends the results to the outputs of all ranks. |
||
Performs the sum operation (or other reduction operations) on the inputs of all ranks, and then distributes the result evenly to the output buffers of ranks according to the rank IDs. Each process receives 1/ranksize portion of data from other processes for reduction. |
||
Performs the sum operation (or other reduction operations) on the data of all ranks and sends the result to the specified position on the root rank. |
||
Sends data (whose size can be customized) to all ranks in the communicator and receives data from all ranks. |
||
Sends the same-sized data to all ranks in the communicator and receives the same-sized data from all ranks. |
||
Blocks the streams of all ranks in the specified communicator. |
||
Scatters data of the root rank to other ranks. |
||
Point-to-point communication |
Sends the data at the specified location on the current rank to the specified location on the destination rank. |
|
Receives data from the source node to the specified location of the current node. |
||
Completes sending and receiving tasks in batches on the current rank. The sending and receiving tasks of the current rank are asynchronous and do not block each other. |
||
Exception handling |
If the cluster information shows that the communication link of the device network port is unstable or network congestion occurs, "error cqe" is printed in the device log. This error is called "RDMA ERROR CQE". In the current version, this API can only be used to check whether the "RDMA ERROR CQE" error exists in the communicator. |
|
Parses error codes of the HcclResult type. |