Configuring the NIC IP Address of a Device

During distributed training, you need to configure the NIC IP address of a device for communication between multiple devices to synchronize network model parameters. This section describes how to use the HCCN Tool or the configuration script (ascend-deployer/tools/DeviceIP-conf.sh) provided by the ascend-deployer tool to configure the NIC IP address of a device.

This section provides only the network configuration commands of HCCN Tool. If you need to use other functions of HCCN Tool (for example, checking the link status of the network port), see the Ascend 910 6.0.0 HCCN Tool Interface Reference (AI Accelerator Card).

Using HCCN Tool in Atlas 800 Training Server and Atlas 900 AI Cluster Scenarios

To check whether the SMP or AMP mode is used, log in to the BMC background and run the ipmcget -d npuworkmode command.

  • SMP (symmetric multi-processor) mode
    Log in to the AI Servers as the root user and configure the NIC IP address of each device. The configuration requirements are as follows:
    • NICs 0 and 4, 1 and 5, 2 and 6, and 3 and 7 of an AI Server must be in the same network segment respectively. NICs 0, 1, 2, and 3 must be in different network segments. NICs 4, 5, 6, and 7 must be in different network segments.
    • In the cluster scenario, the devices in the similar positions on AI Servers must be in the same network segment. For example, NIC 0 of AI Server 1 and AI Server 2 must be in the same network segment, and NIC 1 of AI Server 1 and AI Server 2 must be in the same network segment. Change the IP address as required.
    hccn_tool -i 0 -ip -s address 192.168.100.101 netmask 255.255.255.0
    hccn_tool -i 1 -ip -s address 192.168.101.101 netmask 255.255.255.0
    hccn_tool -i 2 -ip -s address 192.168.102.101 netmask 255.255.255.0
    hccn_tool -i 3 -ip -s address 192.168.103.101 netmask 255.255.255.0
    hccn_tool -i 4 -ip -s address 192.168.100.100 netmask 255.255.255.0
    hccn_tool -i 5 -ip -s address 192.168.101.100 netmask 255.255.255.0
    hccn_tool -i 6 -ip -s address 192.168.102.100 netmask 255.255.255.0
    hccn_tool -i 7 -ip -s address 192.168.103.100 netmask 255.255.255.0
  • AMP (asymmetric multi-processor) mode

    In AMP mode, you do not need to configure the NIC IP address of the device.

Using HCCN Tool in Atlas 300T Training Card Scenarios

Each server can be configured with one or two Atlas 300T training cards. Each card corresponds to one Device OS and needs to be configured with one IP address. Different cards need to be configured with IP addresses in the same network segment.

Log in to the AI Servers as the root user and configure the NIC IP address of each device. The configuration operations are as follows:

  1. Run the npu-smi info command to view the ID of the device to be configured. In Figure 1, the NPU IDs are 1 and 4, for example. Use the actual NPU IDs in the query result.
    Figure 1 Checking the device ID
  2. Run the following commands to configure the NIC IP addresses of the device. The IP addresses used in the following example are for reference only.
    hccn_tool -i 1 -ip -s address 192.168.0.2 netmask 255.255.255.0
    hccn_tool -i 4 -ip -s address 192.168.0.3 netmask 255.255.255.0

Ensure that the npu-smi tool has been installed on the server.

Deploying Device IP Addresses in Batches

You can modify the IP address of an NPU board in the ascend-deployer/tools/DeviceIP-conf.sh script and use the batch deployment capability of Ansible to implement batch configuration. The following content is for reference only.

  • Device IP indicates the IP address of the NPU board to be modified.
  • The batch operation cannot be performed on devices with different types. That is, the device type, number of PCIe NPUs, number of configured IP addresses, and working mode of the target device must be the same.
  • Each server has two NPU boards, and each NPU board has four NPUs. In SMP mode, the four NPUs on each NPU board need to be configured with four IP addresses in different network segments.

Log in to the target server as the root user and perform the following operations:

  1. Prepare the OS IP address file and device IP address file.
    1. OS IP address file
      • Format 1 (recommended): IP address segment, which is similar to IPx–IPy and ends with a carriage return character. For example:
        10.80.100.101~10.80.100.104
      • Format 2: IP address list. The OS IP addresses are provided one by one and end with a carriage return character. For example:
        10.80.100.101
        10.80.100.102
        10.80.100.103
        10.80.100.104
    2. Device IP address file
      • Format 1 (recommended): IP address segment, which is similar to the format of IPx–IPy/netmask/gateway. In SMP mode, four NPUs on each NPU board need to be configured with four IP addresses in different network segments. Each IP address segment ends with a carriage return character. For example:
        172.168.1.100~172.168.1.107/255.255.255.0/172.168.1.1
        172.168.2.100~172.168.2.107/255.255.255.0/172.168.2.1
        172.168.3.100~172.168.3.107/255.255.255.0/172.168.3.1
        172.168.4.100~172.168.4.107/255.255.255.0/172.168.4.1
      • Format 2: IP address list, which is similar to the format of IP/netmask/gateway. OS IP addresses are provided one by one and end with a carriage return character. For example:
        172.168.1.100/255.255.255.0/172.168.1.1
        172.168.2.100/255.255.255.0/172.168.2.1
        172.168.3.100/255.255.255.0/172.168.3.1
        172.168.4.100/255.255.255.0/172.168.4.1
  2. Run the following command to convert the OS IP address file and device IP address file to the UNIX format:
    dos2unix OS_IP.txt
  3. Upload the OS IP address file, device IP address file, and device IP address configuration script to the specified directories (/root/uploadosip, /root/uploaddeviceip, and /root/uploaddeviceip respectively) on the target host.
  4. Run the following command to perform batch configuration in the /root/uploaddeviceip directory:
    bash DeviceIP-conf.sh [Device type] [Number of PCIe NPUs] [Configuration of PCIe NPU IP addresses] [Working mode] [OS IP address file] [Device IP address file]
    • Device type: Use npu-smi info to query the number of NPUs. If the number of NPUs is 8, the value is 1. If the number of NPUs is 4, the value is 2.
    • Number of PCIe NPUs: Use the actual number of PCIe NPUs.
    • Configuration of PCIe NPU IP addresses: Use the actual number of IP addresses of PCIe NPUs.
    • Working mode: Set it to the actual mode, which can be symmetric multi-processor (SMP) mode or asymmetric multi-processor (AMP) mode.
    • OS IP address file and Device IP address file: Modify them based on the actual paths uploaded in 3.

    The following command is a reference for the Atlas 800 server (model 9000) with eight non-PCIe MPUs in SMP mode:

    bash DeviceIP-conf.sh 1 0 0 SMP /root/uploadosip/OS_IP /root/uploaddeviceip/Device_IP