P2P Bandwidth Test on a SuperPoD
Function
Test the inter-node HCCS transmission rate and total time consumption between SuperPoDs.
Test Procedure
Currently, the bandwidth can be tested in MPI mode (recommended) or by configuring a shared directory for a SuperPoD.
- A shared directory is required for a SuperPoD to pass shared addresses and process IDs. The following uses device A and device B as an example to describe how to conduct a P2P bandwidth test on a SuperPoD.
- Prepare a SuperPoD environment for mounting the shared directory and ensure that the node to be tested can access the shared directory.
- Install CANN and MindCluster ToolBox, and configure environment variables.
- Start Ascend DMI on device A, specify the SuperPoD bandwidth test, and specify the IP address of device B.
- Start Ascend DMI on device B, specify the SuperPoD bandwidth test, and specify the IP address of device A.
- Print the test result.
- If you want to test the SuperPoD P2P bandwidth using MPI, install and configure MPI in the environment in advance by referring to "Installing and Configuring MPI" in CANN HCCL Performance Tester Instructions and configure the environment variables by referring to "Compilation" in CANN HCCL Performance Tester Instructions. Note that only Open MPI 4.1.5 is supported. Do not use parameters, configurations, and environment variables that are not mentioned in the description and examples. For details about other parameters, environment variables, and configuration files of Open MPI, see Open MPI documentation.
- In the preceding process, the interval between starting Ascend DMI on device A and device B to test the SuperPoD bandwidth cannot exceed 10 seconds.
- Ensure that the entered IP address is valid.
Parameters
You can run either of the following commands to list the parameters of the bandwidth test command:
ascend-dmi --bw -h
ascend-dmi --bw --help
Table 1 lists only test-specific parameters. For details about other common parameters, see Common Parameters.
Parameter |
Description |
Restrictions |
Mandatory or Not |
|---|---|---|---|
[-bw, --bw, --bandwidth] |
Measures the processor bandwidth. -bw is supported, but --bw or --bandwidth is recommended. |
- |
Yes |
[-t, --type] |
Specifies the type of data flows. |
Currently, only the P2P mode is supported. |
Yes |
[-sp, --sp, --super-pod] |
Specifies a SuperPoD test. The value can be 0, 1, or 2. 0 indicates that the unidirectional bandwidth of the source SuperPoD is preferentially tested. 1 indicates that the peer SuperPoD (destination SuperPoD) is tested. 2 indicates that the SuperPoD is tested in MPI mode. |
|
Yes |
[-ip, --ip, --peer-ip] |
Specifies the IP address of the peer node during the SuperPoD test. This parameter must be used together with [-hip, --hip, --host-ip] and [-sp, --sp, --super-pod]. |
|
Yes |
[-hip, --hip, --host-ip] |
Specifies the local host IP address. This parameter must be used together with [-ip, --ip, --peer-ip] and [-sp, --sp, --super-pod]. |
|
Yes |
[-spp, --spp, --super-pod-path] |
Specifies the path of the shared directory that can be accessed between nodes. |
The specified directory path must meet security requirements (no permissions allowed for other users or groups), and both ends must have the file read and write permissions. The path cannot contain the wildcard (*). |
Yes |
[-ds, --ds, --device-src] |
Specifies the ID of the source device for a P2P test. This parameter must be specified together with [-dd, --dd, --device-dst]. |
After [-ds, --ds, --device-src] and [-dd, --dd, --device-dst] are specified, [-d, --device] cannot be specified. |
No |
[-dd, --dd, --device-dst] |
Specifies the ID of the destination device for a P2P test. This parameter must be specified together with [-ds, --ds, --device-src]. |
After [-ds, --ds, --device-src] and [-dd, --dd, --device-dst] are specified, [-d, --device] cannot be specified. |
No |
[-m, --mode] |
Specifies a bandwidth test mode, which can be a card- or device-level bandwidth test. If this parameter is not specified, the device-level bandwidth test is performed by default.
|
This parameter is supported only by the Atlas 900 A3 SuperPoD, Atlas 9000 A3 SuperPoD, and A200T A3 Box8 SuperPoD Server. |
No |
[-d, --device] |
Specifies the device ID of the node to be tested. The default value is 0.
|
If [-d, --device] is specified, [-ds, --ds, --device-src] and [-dd, --dd, --device-dst] cannot be specified. |
No |
[-s, --size] |
Specifies the size of the data to be transmitted and the display mode of the test result.
|
|
No |
[-et, --et, --execute-times] |
Specifies the number of iterations, that is, number of copy times in the memory. |
|
No |
Note: The ascend_check directory is generated in the directory specified by the --spp parameter. In this directory, two temporary flag files and a directory named --hip_-d_--ip_-d_-m are generated. The procInfo and procInfoBi temporary files are generated in --hip_-d_--ip_-d_-m. |
|||
Example
Example 1: Test the SuperPoD P2P bandwidth in Open MPI mode
Run the mpirun -mca btl_tcp_if_include xx.xx.xx.x2(Source node IP address)/port -mca btl_tcp_port_min_v4 32768 -mca btl_tcp_port_min_v6 32768 -host xx.xx.xx.x2(Source node IP address):1,xx.xx.xx.x3(Peer node IP address):1 -n 2 ascend-dmi --bw -t p2p --sp 2 --dd 0 --ds 1 -m card -q command to test the SuperPoD P2P bandwidth.
- Parameter description:
-mca btl_tcp_if_include specifies the MPI communication standard and communication port.
-mca btl_tcp_port_min_v4 specifies the minimum TCP port number used by the MPI IPv4 standard.
-mca btl_tcp_port_min_v6 specifies the minimum TCP port number used by the MPI IPv6 standard.
- No authentication mechanism is available for the ports used by MPI. To prevent malicious connection attacks, perform security hardening by referring to Avoiding MPI All-Zero Listening.
[****@node-97-52 xxx]# mpirun -mca btl_tcp_if_include xx.xx.xx.x2/22 -mca btl_tcp_port_min_v4 32768 -mca btl_tcp_port_min_v6 32768 -host xx.xx.xx.x2:1,xx.xx.xx.x3:1 -n 2 ascend-dmi --bw -t p2p --sp 2 --dd 0 --ds 1 -m card -q Unidirectional Peer to Peer Test Pod: rank 1 device id: 0 to Pod: rank 0 device id: 1 ---------------------------------------------------------------- Size(Bytes) Execute Times Bandwidth(GB/s) Elapsed Time(us) ---------------------------------------------------------------- 536870912 40 328.939824 3264.25 ---------------------------------------------------------------- Bidirectional Peer to Peer Test Pod: rank 1 device id: 0 and Pod: rank 0 device id: 1 ---------------------------------------------------------------- Size(Bytes) Execute Times Bandwidth(GB/s) Elapsed Time(us) ---------------------------------------------------------------- 536870912 40 580.671140 3698.28 ---------------------------------------------------------------- Unidirectional Peer to Peer Test Pod: rank 0 device id: 1 to Pod: rank 1 device id: 0 ---------------------------------------------------------------- Size(Bytes) Execute Times Bandwidth(GB/s) Elapsed Time(us) ---------------------------------------------------------------- 536870912 40 328.874329 3264.90 ---------------------------------------------------------------- Bidirectional Peer to Peer Test Pod: rank 0 device id: 1 and Pod: rank 1 device id: 0 ---------------------------------------------------------------- Size(Bytes) Execute Times Bandwidth(GB/s) Elapsed Time(us) ---------------------------------------------------------------- 536870912 40 583.744108 3678.81 ----------------------------------------------------------------
Example 2: Test SuperPoD P2P bandwidth test without specifying the device ID (shared directory mode)
The commands are as follows. Replace xx.xx.xx.xx with the IP address of the node where the unidirectional bandwidth is preferentially tested, and replace yy.yy.yy.yy with the IP address of the peer node.
Run the following command on the node (source node) where the unidirectional bandwidth is preferentially tested:
ascend-dmi --bw -t p2p --sp 0 --ip yy.yy.yy.yy --spp /xxx/xxx/xxx --hip xx.xx.xx.xx
Run the following command on the peer node (target node):
ascend-dmi --bw -t p2p --sp 1 --ip xx.xx.xx.xx --spp /xxx/xxx/xxx --hip yy.yy.yy.yy
[root@*****~]# ascend-dmi --bw -t p2p -sp 1 -ip xx.xx.xx.xx -q --spp /xxx/xxx/xxx --hip yy.yy.yy.yy Unidirectional Peer to Peer Test Pod: yy.yy.yy.yy device id: 0 to Pod: xx.xx.xx.xx device id: 0 ---------------------------------------------------------------- Size(Bytes) Execute Times Bandwidth(GB/s) Elapsed Time(us) ---------------------------------------------------------------- 536870912 40 164.336497 3266.90 ---------------------------------------------------------------- Bidirectional Peer to Peer Test Pod: yy.yy.yy.yy device id: 0 and Pod: xx.xx.xx.xx device id: 0 ---------------------------------------------------------------- Size(Bytes) Execute Times Bandwidth(GB/s) Elapsed Time(us) ---------------------------------------------------------------- 536870912 40 290.613631 3694.74 ----------------------------------------------------------------
[root@*****~]# ascend-dmi --bw -t p2p -sp 0 -ip yy.yy.yy.yy -q --spp /xxx/xxx/xxx --hip xx.xx.xx.xx Unidirectional Peer to Peer Test Pod: xx.xx.xx.xx device id: 0 to Pod: yy.yy.yy.yy device id: 0 ---------------------------------------------------------------- Size(Bytes) Execute Times Bandwidth(GB/s) Elapsed Time(us) ---------------------------------------------------------------- 536870912 40 164.336497 3266.90 ---------------------------------------------------------------- Bidirectional Peer to Peer Test Pod: xx.xx.xx.xx device id: 0 and Pod: yy.yy.yy.yy device id: 0 ---------------------------------------------------------------- Size(Bytes) Execute Times Bandwidth(GB/s) Elapsed Time(us) ---------------------------------------------------------------- 536870912 40 290.613631 3694.74 ----------------------------------------------------------------
Example 3: Text Bandwidth test between devices 1 on two nodes (shared directory mode)
Run the following command on the node where the unidirectional bandwidth is preferentially tested:
ascend-dmi --bw -t p2p --sp 0 --ip yy.yy.yy.yy -d 1 --spp /xxx/xxx/xxx --hip xx.xx.xx.xx
Run the following command on the peer node:
ascend-dmi --bw -t p2p --sp 1 --ip xx.xx.xx.xx -d 1 --spp /xxx/xxx/xxx --hip yy.yy.yy.yy
Example 4: Text Bandwidth test between cards 1 on two nodes (shared directory mode)
Run the following command on the node (source node) where the unidirectional bandwidth is preferentially tested:
ascend-dmi --bw -t p2p --sp 0 --ip yy.yy.yy.yy -d 1 --spp /xxx/xxx/xxx --hip xx.xx.xx.xx -m card
Run the following command on the peer node (target node):
ascend-dmi --bw -t p2p --sp 1 --ip xx.xx.xx.xx -d 1 --spp /xxx/xxx/xxx --hip yy.yy.yy.yy -m card
Parameter |
Description |
|---|---|
Unidirectional Peer to Peer Test |
Unidirectional P2P |
Bidirectional Peer to Peer Test |
Bidirectional P2P |
Pod: yy.yy.yy.yy device/card: 0 to Pod: xx.xx.xx.xx device/card: 0 |
device or card is displayed based on the value of mode. The first pod indicates the node where unidirectional bandwidth is preferentially tested. yy.yy.yy.yy indicates the IP address of the node where unidirectional bandwidth is preferentially tested. The first device indicates the ID of the device where unidirectional bandwidth is preferentially tested. The second pod is the peer node. x.x.x.x indicates the IP address of the peer node. The second device indicates the device ID of the peer node. |
Size(Bytes) |
Size of data to be transmitted, in byte. |
Execute Times |
Number of iterations |
Bandwidth(GB/s) |
Bandwidth of the processor, in GB/s. |
Elapsed Time(us) |
Execution duration |