HCCL_HOST_SOCKET_PORT_RANGE
Description
Configures the communication port used by HCCL on the host when the communicator is created based on root node information.
This environment variable can be set to a specific port, port range, or the string auto.
- If a specific port number or port range is specified, it is recommended that the number of planned ports be greater than or equal to the number of HCCL processes on a single NPU. The port number ranges from [1, 65535]. Ensure that the specified port is not occupied by other processes. Note that ports [1, 1023] are reserved for the system. Avoid using these ports.
The port number and port range can be used together. Use commas (,) to separate them. However, the port numbers and port ranges separated by commas (,) cannot overlap. For details about configuration, see Example.
- If this environment variable is set to auto, the host communication port used by HCCL is dynamically allocated by the OS.
Example
1 2 3 4 5 6 7 8 |
// Method 1: Set this environment variable to a port range. export HCCL_HOST_SOCKET_PORT_RANGE="60000-60050" // Method 2: Use a specific port number and port ranges together, and separate them with commas (,). export HCCL_HOST_SOCKET_PORT_RANGE="60000,60050-60100,60150-60160" // Method 3: Specify port numbers, and separate them with commas (,). export HCCL_HOST_SOCKET_PORT_RANGE="56000,56005,56007,56008,56100,56105,56107,56108" // Method 4: The OS dynamically allocates port numbers. export HCCL_HOST_SOCKET_PORT_RANGE="auto" |
Restrictions
- If multiple service processes share one NPU, you are advised to configure this environment variable. Otherwise, the service may fail to run due to port conflicts. However, multiple processes affect resource overheads and communication performance.
- This environment variable has a higher priority than HCCL_IF_BASE_PORT. If it is configured, the communication port used by HCCL on the host is subject to this environment variable.
- For the
Atlas A2 training products /Atlas A2 inference products , if there are MC² operators (such as AllGatherMatmul, MatmulReduceScatter, and AlltoAllAllGatherBatchMatMul) on the network, this environment variable cannot be configured.
Applicability
Parent topic: Network