Communicator Management
A communicator is a context for executing collective communication operators. It manages corresponding communication objects (for example, NPUs) and resources required for communication. A rank is a communication object in the communicator. Each rank is assigned a unique identifier ranging from 0 to n – 1 (n is the number of NPUs).
- Multi-server collective communication
- If there is a complete rank table file that describes the cluster information, you can create a communicator by calling the HcclCommInitClusterInfo API or create a communicator with specific configurations by calling the HcclCommInitClusterInfoConfig API.
- If there is no complete rank table file, you can create a communicator based on root rank information by using the HcclGetRootInfo API together with HcclCommInitRootInfo/HcclCommInitRootInfoConfig.
- For single-server collective communication, you can create communicators in batches in a single server by using the HcclCommInitAll API.
- You can split an existing communicator into sub-communicators with specific configurations by using the HcclCreateSubCommConfig API.
- All communication operators in multiple communicators must be delivered in serial mode on each device. Out-of-order delivery, multi-thread concurrent delivery, and thread reentry are not allowed.
- On the same device, the delivery threads of all communication operators in the same communicator must use the same context. For details about the context, see "Runtime Management" in Application Development Guide (C&C++).
- Graph-mode communication and single-operator communication cannot be performed together in the same communicator.
- Operators in the same communicator must be executed in serial mode.
- Multiple communicators need to be created in serial mode on the same NPU.
- For the
Atlas A3 training products /Atlas A3 inference products , if there are multiple supernodes on the network during communicator initialization, configure the information about the AI servers that belong to the same supernode together. Assume that there are two supernodes whose IDs are 0 and 1. Configure the AI server information in supernode 0 and then configure the AI server information in supernode 1. The cross configurations of the AI server information in supernodes 0 and 1 are not supported.
Creating a Communicator Based on the Rank Table
- Construct a rank table file. For details about how to configure the rank table file, see Cluster Information Configuration.
- Each device calls the HcclCommInitClusterInfo API to create a communicator, or calls the HcclCommInitClusterInfoConfig API to create a communicator with specific configurations.
1 2 3 4 5 6 7 8 9 10 11 12 |
int devId = 0; // Configure the path of the rank table file. char* rankTableFile = "/home/rank_table.json"; // Define the communicator handle. HcclComm hcclComm; // Initialize the HCCL communicator. HcclCommInitClusterInfo(rankTableFile, devId, &hcclComm); /* Collective communication */ // Destroy the HCCL communicator. HcclCommDestroy(hcclComm); |
If the service runs in a single-device multi-process scenario on the
Creating a Communicator Based on Root Rank Information
- If each device corresponds to a service process, the implementation process is as follows:
- (Optional) Specify the communication IP address or NIC used by the host rank during HCCL initialization.
- Method 1: On each host rank, set an IP address using HCCL_IF_IP for its communication with the root rank. The IP address can be in IPv4 or IPv6 format. Only one IP address can be configured. A configuration example is as follows:
1export HCCL_IF_IP=10.10.10.1
- Method 2: On each host rank, set a NIC name using HCCL_SOCKET_IFNAME and the communication protocol for the NIC using HCCL_SOCKET_FAMILY. HCCL will use the NIC name to obtain the host IP address for communication with the root rank. A configuration example is as follows:
1 2 3 4 5 6 7 8 9 10 11
# IP version used by the communication NIC during HCCL initialization. AF_INET indicates that IPv4 is used. AF_INET6 indicates that IPv6 is used. export HCCL_SOCKET_FAMILY=AF_INET # The following formats of NIC names are supported. (Select one from the four formats. If multiple NICs are configured via the environment variables, separate them with commas (,). The first matched NIC is used as the communication NIC.) # Exact match of the NIC export HCCL_SOCKET_IFNAME==eth0,enp0 # Use the specified eth0 or enp0 NIC. export HCCL_SOCKET_IFNAME=^=eth0,enp0 # Do not use the eth0 or enp0 NIC. # Fuzzy match of the NIC export HCCL_SOCKET_IFNAME=eth,enp # Use all NICs prefixed with eth or enp. export HCCL_SOCKET_IFNAME=^eth,enp # Do not use any NIC prefixed with eth or enp.
The priority of HCCL_IF_IP is higher than that of HCCL_SOCKET_IFNAME. If HCCL_IF_IP or HCCL_SOCKET_IFNAME is not set, the system automatically selects NICs according to the following priorities. If the NIC specified for the current rank is disconnected from that for the root rank, the HCCL link establishment will fail.
NICs other than Docker or local NICs (in ascending alphabetical order of NIC names) > Docker NICs > local NICs
- Method 1: On each host rank, set an IP address using HCCL_IF_IP for its communication with the root rank. The IP address can be in IPv4 or IPv6 format. Only one IP address can be configured. A configuration example is as follows:
- Call the HcclGetRootInfo API on the root rank to generate root rank information (rootInfo), including the device IP address and device ID.
- Broadcast the root rank information to all ranks in the communicator.
- Call the HcclCommInitRootInfo or HcclCommInitRootInfoConfig API on all ranks in a communicator (to create communicators with specific configurations) to initialize the communicator based on the received rootInfo parameter and the current rank ID.
- (Optional) Specify the communication IP address or NIC used by the host rank during HCCL initialization.
- Each AI server corresponds to a service process, and each thread corresponds to a device. The following shows how to create a multi-device communicator using multiple threads.
- Specify the communication IP address or NIC (optional) used by the host rank during HCCL initialization. For details, see method 1 in the scenario where each device corresponds to a service process.
- In the main process, loop through "specifying different devices and calling the HcclGetRootInfo API" to obtain multiple pieces of rootInfo information.
- Each device matches a thread. The HcclCommInitRootInfo or HcclCommInitRootInfoConfig API is called concurrently based on different rootInfo information to initialize the communicator.
export HCCL_HOST_SOCKET_PORT_RANGE="auto" export HCCL_NPU_SOCKET_PORT_RANGE="auto"
Creating Communicators in Batches on a Single Server
- Construct a device list in the communicator, for example, {0, 1, 2, 3, 4, 5, 6, 7}. The device ID in the list is the logical ID (which can be queried by running the npu-smi info -m command). The HCCL creates communicators based on the sequence set in the list.
- Call the HcclCommInitAll API in the process to create communicators.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
uint32_t ndev = 8; // Construct the logical ID list of the devices. int32_t devices[8] = {0, 1, 2, 3, 4, 5, 6, 7}; // Define the communicator handles. HcclComm comms[ndev]; // Initialize the HCCL communicators. HcclCommInitAll(ndev, devices, comms); // Start the thread to perform collective communication. std::vector<std::unique_ptr<std::thread> > threads(ndev); struct ThreadContext args[ndev]; for (uint32_t i = 0; i < ndev; i++) { args[i].device = i; args[i].comm = comms[i]; /* Collective communication */ } // Destroy the HCCL communicators. for (uint32_t i = 0; i < ndev; i++) { HcclCommDestroy(comms[i]); } |
Note: When multiple threads call the collective communication operation API (for example, HcclAllReduce), ensure that the time gap between two calls of the collective communication API by different threads does not exceed the link setup timeout of collective communication. To avoid link setup timeout, you can set the environment variable HCCL_CONNECT_TIMEOUT, which is 120s by default.
Splitting an Existing Communicator into Sub-Communicators
HCCL provides the HcclCreateSubCommConfig API to split an existing communicator into sub-communicators with configured features. In this way, a sub-communicator can be created without socket link setup and rank information exchange, which can be used to create a communicator fast in the case of service faults.
1 2 3 4 5 6 7 8 9 10 11 12 |
// Initialize the global communicator. HcclComm globalHcclComm; HcclCommInitClusterInfo(rankTableFile, devId, &globalHcclComm); // Configure the communicator. HcclCommConfig config; HcclCommConfigInit(&config); config.hcclBufferSize = 50; strcpy(config.hcclCommName, "comm_1"); // Initialize the sub-communicators. HcclComm hcclComm; uint32_t rankIds[4] = {0, 1, 2, 3}; // Rank list of the sub-communicators. HcclCreateSubCommConfig(&globalHcclComm, 4, rankIds, 1, devId, &config, &hcclComm); |
This API does not support nested splitting of communicators, that is, sub-communicators cannot be further split.