HCCL_IF_BASE_PORT

Description

Specifies the start port number of the host NIC when the communicator is created based on root node information. After the configuration, the system uses 32 ports starting from this port by default to collect cluster information.

The value of this environment variable must be an integer ranging from 1024 to 65520. Ensure that the allocated port is not occupied.

Example

export HCCL_IF_BASE_PORT=50000

Restrictions

In distributed training scenarios, HCCL uses certain ports of the host server to collect cluster information, requiring the operating system to reserve these ports.
  • If you do not specify a port using the HCCL_IF_BASE_PORT environment variable, HCCL uses ports 60000 to 60031 by default. You need to run the following command to reserve OS ports in this range:
    sysctl -w net.ipv4.ip_local_reserved_ports=60000-60031
  • If you use the HCCL_IF_BASE_PORT environment variable to specify a port, for example, 50000, HCCL uses ports 50000 to 50031. You need to run the following command to reserve OS ports in this range:
    sysctl -w net.ipv4.ip_local_reserved_ports=50000-50031

Applicability

Atlas A3 training products/Atlas A3 inference products

Atlas A2 training products/Atlas A2 inference products (For Atlas A2 training products/Atlas A2 inference products, only the Atlas 800T A2 training server, Atlas 900 A2 PoD cluster basic unit, and Atlas 200T A2 Box16 heterogeneous subrack are supported.)

Atlas training products

Atlas inference products (For the Atlas inference products, only the Atlas 300I Duo inference card is supported.)