HCCL_SOCKET_IFNAME

Description

Sets the name of the NIC used by the host during HCCL initialization. HCCL obtains the host IP address based on the NIC name and communicates with the root node to create a communicator.

You can use one of the following rules for configuration:
  • eth: Use all NICs prefixed with eth.

    If multiple NIC prefixes are specified, separate them with commas (,).

    For example, export HCCL_SOCKET_IFNAME=eth,enp indicates that all NICs prefixed with eth or enp are used.

  • ^eth: Do not use NICs prefixed with eth.

    If multiple NIC prefixes are specified, separate them with commas (,).

    For example, export HCCL_SOCKET_IFNAME=^eth,enp indicates that no NIC prefixed with eth or enp is used.

  • =eth0: Use the specified eth0 NIC.

    If multiple NICs are specified, separate them with commas (,).

    For example, export HCCL_SOCKET_IFNAME==eth0,enp0 indicates that the eth0 and enp0 NICs are used.

  • ^=eth0: Do not use the specified eth0 NIC.

    If multiple NICs are specified, separate them with commas (,).

    For example, export HCCL_SOCKET_IFNAME=^=eth0,enp0 indicates that the eth0 and enp0 NICs are not used.

  • Multiple NICs can be configured in HCCL_SOCKET_IFNAME. The first matched NIC is used as the communication NIC.
  • The priority of HCCL_IF_IP is higher than that of HCCL_SOCKET_IFNAME.
  • If HCCL_IF_IP and HCCL_SOCKET_IFNAME are not specified, the priority is as follows:

    NICs other than Docker or local NICs (in ascending alphabetical order of NIC names) > Docker NICs > local NICs

    Note that if HCCL_IF_IP or HCCL_SOCKET_IFNAME is not configured, the system automatically selects NICs based on their priorities. If the NIC specified for the current node is disconnected from that for the root node, the HCCL link establishment will fail.

Example

# Use eth0 or endvnic NICs.
export HCCL_SOCKET_IFNAME==eth0,endvnic

Restrictions

None

Applicability

Atlas A3 training products/Atlas A3 inference products

Atlas A2 training products/Atlas A2 inference products (For Atlas A2 training products/Atlas A2 inference products, only the Atlas 800T A2 training server, Atlas 900 A2 PoD cluster basic unit, and Atlas 200T A2 Box16 heterogeneous subrack are supported.)

Atlas training products

Atlas inference products (For the Atlas inference products, only the Atlas 300I Duo inference card is supported.)