Configuring the NIC IP Address of a Device

During distributed training, you need to use the HCCN Tool in the Ascend software to configure the NIC IP address of the device. This IP address is used for communication between devices to synchronize network model parameters. This section describes only the commands for configuring the network by using the HCCN tool. If you need to use other functions of the HCCN tool (for example, checking the link status of a network port), see the Ascend 910 HCCN Tool API Reference.

Atlas 800 training server and Atlas 900 AI cluster

To check whether the SMP or AMP mode is used, log in to the BMC background and run the ipmcget -d npuworkmode command.

  • SMP (symmetric multi-processor) mode
    Log in to the AI Servers as the root user and configure the NIC IP address of each device. The configuration requirements are as follows:
    • NICs 0 and 4, 1 and 5, 2 and 6, and 3 and 7 of an AI Server must be in the same network segment respectively. NICs 0, 1, 2, and 3 must be in different network segments. NICs 4, 5, 6, and 7 must be in different network segments.
    • In the cluster scenario, the devices in the similar positions on AI Servers must be in the same network segment. For example, NIC 0 of AI Server 1 and AI Server 2 must be in the same network segment, and NIC 1 of AI Server 1 and AI Server 2 must be in the same network segment. Change the IP address as required.
    hccn_tool -i 0 -ip -s address 192.168.100.101 netmask 255.255.255.0
    hccn_tool -i 1 -ip -s address 192.168.101.101 netmask 255.255.255.0
    hccn_tool -i 2 -ip -s address 192.168.102.101 netmask 255.255.255.0
    hccn_tool -i 3 -ip -s address 192.168.103.101 netmask 255.255.255.0
    hccn_tool -i 4 -ip -s address 192.168.100.100 netmask 255.255.255.0
    hccn_tool -i 5 -ip -s address 192.168.101.100 netmask 255.255.255.0
    hccn_tool -i 6 -ip -s address 192.168.102.100 netmask 255.255.255.0
    hccn_tool -i 7 -ip -s address 192.168.103.100 netmask 255.255.255.0
  • AMP (asymmetric multi-processor) mode

    In AMP mode, you do not need to configure the NIC IP address of the device.

Atlas 300T training card

Each server can be configured with one or two Atlas 300T training cards. Each card corresponds to one Device OS and needs to be configured with one IP address. Different cards need to be configured with IP addresses in the same network segment.

Log in to the AI Servers as the root user and configure the NIC IP address of each device. The configuration operations are as follows:

  1. Run the npu-smi info command to view the ID of the device to be configured. In Figure 1, the NPU IDs are 1 and 4, for example. Use the actual NPU IDs in the query result.
    Figure 1 Checking the device ID
  2. Run the following commands to configure the NIC IP addresses of the device. The IP addresses used in the following example are for reference only.
    hccn_tool -i 1 -ip -s address 192.168.0.2 netmask 255.255.255.0
    hccn_tool -i 4 -ip -s address 192.168.0.3 netmask 255.255.255.0

Ensure that the npu-smi tool has been installed on the server.