P2P Bandwidth Test on a SuperPoD

Function

Test the inter-node HCCS transmission rate and total time consumption between SuperPoDs.

Test Procedure

Currently, the bandwidth can be tested in MPI mode (recommended) or by configuring a shared directory for a SuperPoD.

  • A shared directory is required for a SuperPoD to pass shared addresses and process IDs. The following uses device A and device B as an example to describe how to conduct a P2P bandwidth test on a SuperPoD.
    1. Prepare a SuperPoD environment for mounting the shared directory and ensure that the node to be tested can access the shared directory.
    2. Install CANN and MindCluster ToolBox, and configure environment variables.
    3. Start Ascend DMI on device A, specify the SuperPoD bandwidth test, and specify the IP address of device B.
    4. Start Ascend DMI on device B, specify the SuperPoD bandwidth test, and specify the IP address of device A.
    5. Print the test result.
  • If you want to test the SuperPoD P2P bandwidth using MPI, install and configure MPI in the environment in advance by referring to "Installing and Configuring MPI" in CANN HCCL Performance Tester Instructions and configure the environment variables by referring to "Compilation" in CANN HCCL Performance Tester Instructions. Note that only Open MPI 4.1.5 is supported. Do not use parameters, configurations, and environment variables that are not mentioned in the description and examples. For details about other parameters, environment variables, and configuration files of Open MPI, see Open MPI documentation.
  • In the preceding process, the interval between starting Ascend DMI on device A and device B to test the SuperPoD bandwidth cannot exceed 10 seconds.
  • Ensure that the entered IP address is valid.

Parameters

You can run either of the following commands to list the parameters of the bandwidth test command:

ascend-dmi --bw -h

ascend-dmi --bw --help

Table 1 lists only test-specific parameters. For details about other common parameters, see Common Parameters.

Table 1 Parameters

Parameter

Description

Restrictions

Mandatory or Not

[-bw, --bw, --bandwidth]

Measures the processor bandwidth. -bw is supported, but --bw or --bandwidth is recommended.

-

Yes

[-t, --type]

Specifies the type of data flows.

Currently, only the P2P mode is supported.

Yes

[-sp, --sp, --super-pod]

Specifies a SuperPoD test.

The value can be 0, 1, or 2. 0 indicates that the unidirectional bandwidth of the source SuperPoD is preferentially tested. 1 indicates that the peer SuperPoD (destination SuperPoD) is tested. 2 indicates that the SuperPoD is tested in MPI mode.

  • The values of 0 or 1 must be used together and specified together with [-ip, --ip, --peer-ip] and [-hip, --hip, --host-ip].
  • If this parameter is set to 2, it must be used together with [-dd, --dd, --device-dst] and [-ds, --ds, --device-src].

Yes

[-ip, --ip, --peer-ip]

Specifies the IP address of the peer node during the SuperPoD test.

This parameter must be used together with [-hip, --hip, --host-ip] and [-sp, --sp, --super-pod].

  • The value must be a valid IP address.

Yes

[-hip, --hip, --host-ip]

Specifies the local host IP address.

This parameter must be used together with [-ip, --ip, --peer-ip] and [-sp, --sp, --super-pod].

  • The value must be a valid IP address.

Yes

[-spp, --spp, --super-pod-path]

Specifies the path of the shared directory that can be accessed between nodes.

The specified directory path must meet security requirements (no permissions allowed for other users or groups), and both ends must have the file read and write permissions. The path cannot contain the wildcard (*).

Yes

[-ds, --ds, --device-src]

Specifies the ID of the source device for a P2P test. This parameter must be specified together with [-dd, --dd, --device-dst].

After [-ds, --ds, --device-src] and [-dd, --dd, --device-dst] are specified, [-d, --device] cannot be specified.

No

[-dd, --dd, --device-dst]

Specifies the ID of the destination device for a P2P test. This parameter must be specified together with [-ds, --ds, --device-src].

After [-ds, --ds, --device-src] and [-dd, --dd, --device-dst] are specified, [-d, --device] cannot be specified.

No

[-m, --mode]

Specifies a bandwidth test mode, which can be a card- or device-level bandwidth test.

If this parameter is not specified, the device-level bandwidth test is performed by default.

  • device: logical ID of the Ascend NPU.
  • card: card ID of the Ascend NPU, which is used to test the bandwidth of the entire NPU.

This parameter is supported only by the Atlas 900 A3 SuperPoD, Atlas 9000 A3 SuperPoD, and A200T A3 Box8 SuperPoD Server.

No

[-d, --device]

Specifies the device ID of the node to be tested. The default value is 0.

  • If --mode is not specified or a device is specified, this parameter specifies the logical ID of the Ascend NPU.
  • If --mode is set to card, this parameter specifies the card ID of the Ascend NPU.
  • If [-d, --device], [-ds, --ds, --device-src], and [-dd, --dd, --device-dst] are not specified, the test result of device 0/card 0 is used.

If [-d, --device] is specified, [-ds, --ds, --device-src] and [-dd, --dd, --device-dst] cannot be specified.

No

[-s, --size]

Specifies the size of the data to be transmitted and the display mode of the test result.

  • The size ranges from 1 byte to 4 GB.
  • The default value is 536870912 bytes.
  • After -s is specified, a number must be followed, otherwise, the format is incorrect.

No

[-et, --et, --execute-times]

Specifies the number of iterations, that is, number of copy times in the memory.

  • The value range is [1, 100000000]. If this parameter is not specified, the default value 40 is used.
  • The values of the -et parameter specified on the two nodes where the SuperPoD P2P bandwidth test is performed must be the same.

No

Note:

The ascend_check directory is generated in the directory specified by the --spp parameter. In this directory, two temporary flag files and a directory named --hip_-d_--ip_-d_-m are generated. The procInfo and procInfoBi temporary files are generated in --hip_-d_--ip_-d_-m.

Example

Example 1: Test the SuperPoD P2P bandwidth in Open MPI mode

Run the mpirun -mca btl_tcp_if_include xx.xx.xx.x2(Source node IP address)/port -mca btl_tcp_port_min_v4 32768 -mca btl_tcp_port_min_v6 32768 -host xx.xx.xx.x2(Source node IP address):1,xx.xx.xx.x3(Peer node IP address):1 -n 2 ascend-dmi --bw -t p2p --sp 2 --dd 0 --ds 1 -m card -q command to test the SuperPoD P2P bandwidth.

  • Parameter description:

-mca btl_tcp_if_include specifies the MPI communication standard and communication port.

-mca btl_tcp_port_min_v4 specifies the minimum TCP port number used by the MPI IPv4 standard.

-mca btl_tcp_port_min_v6 specifies the minimum TCP port number used by the MPI IPv6 standard.

  • No authentication mechanism is available for the ports used by MPI. To prevent malicious connection attacks, perform security hardening by referring to Avoiding MPI All-Zero Listening.
[****@node-97-52 xxx]# mpirun -mca btl_tcp_if_include xx.xx.xx.x2/22 -mca btl_tcp_port_min_v4 32768 -mca btl_tcp_port_min_v6 32768 -host xx.xx.xx.x2:1,xx.xx.xx.x3:1 -n 2 ascend-dmi --bw -t p2p --sp 2 --dd 0 --ds 1 -m card -q
Unidirectional Peer to Peer Test
Pod: rank 1 device id: 0 to Pod: rank 0 device id: 1
----------------------------------------------------------------
  Size(Bytes)  Execute Times  Bandwidth(GB/s)  Elapsed Time(us)
----------------------------------------------------------------
   536870912         40          328.939824       3264.25
----------------------------------------------------------------

Bidirectional Peer to Peer Test
Pod: rank 1 device id: 0 and Pod: rank 0 device id: 1
----------------------------------------------------------------
  Size(Bytes)  Execute Times  Bandwidth(GB/s)  Elapsed Time(us)
----------------------------------------------------------------
   536870912         40          580.671140       3698.28
----------------------------------------------------------------

Unidirectional Peer to Peer Test
Pod: rank 0 device id: 1 to Pod: rank 1 device id: 0
----------------------------------------------------------------
  Size(Bytes)  Execute Times  Bandwidth(GB/s)  Elapsed Time(us)
----------------------------------------------------------------
   536870912         40          328.874329       3264.90
----------------------------------------------------------------

Bidirectional Peer to Peer Test
Pod: rank 0 device id: 1 and Pod: rank 1 device id: 0
----------------------------------------------------------------
  Size(Bytes)  Execute Times  Bandwidth(GB/s)  Elapsed Time(us)
----------------------------------------------------------------
   536870912         40          583.744108       3678.81
----------------------------------------------------------------

Example 2: Test SuperPoD P2P bandwidth test without specifying the device ID (shared directory mode)

The commands are as follows. Replace xx.xx.xx.xx with the IP address of the node where the unidirectional bandwidth is preferentially tested, and replace yy.yy.yy.yy with the IP address of the peer node.

Run the following command on the node (source node) where the unidirectional bandwidth is preferentially tested:

ascend-dmi --bw -t p2p --sp 0 --ip yy.yy.yy.yy --spp /xxx/xxx/xxx --hip xx.xx.xx.xx

Run the following command on the peer node (target node):

ascend-dmi --bw -t p2p --sp 1 --ip xx.xx.xx.xx --spp /xxx/xxx/xxx --hip yy.yy.yy.yy
Figure 1 SuperPoD P2P bandwidth test example (target node)
[root@*****~]#  ascend-dmi --bw -t p2p -sp 1 -ip xx.xx.xx.xx -q --spp /xxx/xxx/xxx --hip yy.yy.yy.yy
Unidirectional Peer to Peer Test
Pod: yy.yy.yy.yy device id: 0 to Pod: xx.xx.xx.xx device id: 0
----------------------------------------------------------------
  Size(Bytes)  Execute Times  Bandwidth(GB/s)  Elapsed Time(us)
----------------------------------------------------------------
   536870912         40          164.336497       3266.90
----------------------------------------------------------------

Bidirectional Peer to Peer Test
Pod: yy.yy.yy.yy device id: 0 and Pod: xx.xx.xx.xx device id: 0
----------------------------------------------------------------
  Size(Bytes)  Execute Times  Bandwidth(GB/s)  Elapsed Time(us)
----------------------------------------------------------------
   536870912         40          290.613631       3694.74
----------------------------------------------------------------
Figure 2 SuperPoD P2P bandwidth test example (source node)
[root@*****~]#  ascend-dmi --bw -t p2p -sp 0 -ip yy.yy.yy.yy -q --spp /xxx/xxx/xxx --hip xx.xx.xx.xx
Unidirectional Peer to Peer Test
Pod: xx.xx.xx.xx device id: 0 to Pod: yy.yy.yy.yy device id: 0
----------------------------------------------------------------
  Size(Bytes)  Execute Times  Bandwidth(GB/s)  Elapsed Time(us)
----------------------------------------------------------------
   536870912         40          164.336497       3266.90
----------------------------------------------------------------

Bidirectional Peer to Peer Test
Pod: xx.xx.xx.xx device id: 0 and Pod: yy.yy.yy.yy device id: 0
----------------------------------------------------------------
  Size(Bytes)  Execute Times  Bandwidth(GB/s)  Elapsed Time(us)
----------------------------------------------------------------
   536870912         40          290.613631       3694.74
----------------------------------------------------------------

Example 3: Text Bandwidth test between devices 1 on two nodes (shared directory mode)

Run the following command on the node where the unidirectional bandwidth is preferentially tested:

ascend-dmi --bw -t p2p --sp 0 --ip yy.yy.yy.yy -d 1 --spp /xxx/xxx/xxx --hip xx.xx.xx.xx

Run the following command on the peer node:

ascend-dmi --bw -t p2p --sp 1 --ip xx.xx.xx.xx -d 1 --spp /xxx/xxx/xxx --hip yy.yy.yy.yy

Example 4: Text Bandwidth test between cards 1 on two nodes (shared directory mode)

Run the following command on the node (source node) where the unidirectional bandwidth is preferentially tested:

ascend-dmi --bw -t p2p --sp 0 --ip yy.yy.yy.yy -d 1 --spp /xxx/xxx/xxx --hip xx.xx.xx.xx -m card

Run the following command on the peer node (target node):

ascend-dmi --bw -t p2p --sp 1 --ip xx.xx.xx.xx -d 1 --spp /xxx/xxx/xxx --hip yy.yy.yy.yy -m card
The following table describes parameters shown in the preceding SuperPoD P2P bandwidth test (target node).
Table 2 Parameter description

Parameter

Description

Unidirectional Peer to Peer Test

Unidirectional P2P

Bidirectional Peer to Peer Test

Bidirectional P2P

Pod: yy.yy.yy.yy device/card: 0 to Pod: xx.xx.xx.xx device/card: 0

device or card is displayed based on the value of mode.

The first pod indicates the node where unidirectional bandwidth is preferentially tested. yy.yy.yy.yy indicates the IP address of the node where unidirectional bandwidth is preferentially tested. The first device indicates the ID of the device where unidirectional bandwidth is preferentially tested.

The second pod is the peer node. x.x.x.x indicates the IP address of the peer node. The second device indicates the device ID of the peer node.

Size(Bytes)

Size of data to be transmitted, in byte.

Execute Times

Number of iterations

Bandwidth(GB/s)

Bandwidth of the processor, in GB/s.

Elapsed Time(us)

Execution duration