Bandwidth Test

Function

Measure the bus bandwidth and memory bandwidth.

Parameters

You can run either of the following commands to check the parameters of the bandwidth test command:

ascend-dmi --bw -h

ascend-dmi --bw --help

Table 1 lists only test-specific parameters. For details about other common parameters, see Common Parameters.

Table 1 Parameter description

Parameter

Description

Restriction

Mandatory

[-bw, --bw, --bandwidth]

Measures the bandwidth of the device or entire NPU.

-

Yes

[-t, --type]

Specifies the data flows to be tested.

During bandwidth test, the test data flow can be divided into the following directions. If this parameter is not specified, the information about the bandwidth and total time consumption in the H2D, D2H, and D2D directions will be returned by default.
  • H2D: measures the bandwidth and total time consumption for data transmission from the host memory to the device memory through the PCIe bus. (When the bandwidth test is conducted on the Atlas A3 training product and Atlas A3 inference product, the total bandwidth and time consumption of the HCCS plane are tested.)
  • D2H: measures the bandwidth and total time consumption for data transmission from the device memory to the host memory through the PCIe bus. (When the bandwidth test is conducted on the Atlas A3 training product and Atlas A3 inference product, the total bandwidth and time consumption of the HCCS plane are tested.)
  • D2D: measures the bandwidth and total time consumption for data transmission from the device's DDR/on-chip memory to the device's chip register (mainly used to measure the device memory bandwidth).
  • P2P: measures the transmission rate and total time consumption from the source device to the destination device.
    NOTE:

    When P2P is used and the device is not specified (-ds and -dd are not specified), -s, -et, and -fmt do not take effect. The fixed-length mode and the corresponding default value are used. For example, in the ascend-dmi --bw -t p2p -fmt json command, -fmt does not take effect and the default value normal is used.

  • For the Atlas 200I/500 A2 inference products, only the D2D mode is supported, and this parameter is not supported.
  • The P2P mode is supported only by the Atlas training products, Atlas A2 training products, Atlas A2 inference products, Atlas A3 training products, Atlas A3 inference products, and Atlas 300I Duo inference card.
  • For the Atlas 300I Duo inference card in P2P mode, -ds and -dd support only the device ID of the primary chip or the device IDs of the primary and secondary chips of the same card.
  • When the P2P test is performed between two 8-NPU groups on the Atlas 200T A2 Box16/Atlas 200I A2 Box16 heterogeneous subrack, only the P2P results of two equivalent positions can be output. For example, device 0 corresponds to device 8, and the P2P test result of device 0 to device 8 is output.

No

[-m, --mode]

Specifies a bandwidth test mode, which can be a card- or device-level bandwidth test.

If this parameter is not specified, the device-level bandwidth test is performed by default.

  • device: logic ID of the Ascend NPU.
  • card: card ID of the Ascend NPU, which is used to test the bandwidth of the entire NPU.

This parameter is available only when type is set to p2p for Atlas A3 training and inference products.

No

[-s, --size]

Specifies the size of the data to be transmitted and the display mode of the test result.

  • The value range of the transmitted data is as follows:
    • Atlas A3 training product and Atlas A3 inference product: 1 byte to 4 GB in D2H, H2D, and P2P modes
    • Other products: 1 byte to 512 MB.
  • After -s is specified, a number must be followed; otherwise, the format is incorrect.
    • -ds and -dd specified in H2D, D2H, D2D, and P2P modes
      • -s as the fixed-length mode
      • -s not as the step mode: 2 bytes to 32 MB
    • -ds and -dd not specified in P2P mode. In this scenario, -s does not take effect. The fixed-length mode and default value are used. The default values are described as follows:
      • For the Atlas A2 training product, Atlas A3 training product, Atlas A3 inference product, and Atlas A2 inference product, the default size of the data to be transmitted from device 0 or 8 to other devices is 512 MB. In other cases, the default size is 256 MB.
      • For other products, the default size of data to be transmitted is 128 MB.
  • For the Atlas A2 training product, Atlas A3 training product, Atlas A3 inference product, and Atlas A2 inference product, if -t is set to d2d, the size of the data to be transmitted is determined by the AI Core. As a result, -s is not supported.
  • For the Atlas 200I/500 A2 inference products, the size of the data to be transmitted is fixed at 0.97 GB (determined by the tensor), and this parameter is not supported.

No

[-et, --et, --execute-times]

Number of iterations, that is, number of copy times in the memory.

The value range is [1, 1000]. If this parameter is not specified, the number of copy times in step mode is 5 by default, and the number of copy times in fixed-length mode is 40 by default.

This parameter is not supported by the Atlas 200I/500 A2 inference products, Atlas A2 training product, Atlas A3 training product, Atlas A3 inference product, and Atlas A2 inference product in D2D mode. The default number of copy times is 1.

No

[-d, --device]

Specifies the device ID where bandwidth test is to be performed. The device ID is the logic ID of the Ascend AI Processor.

If the device ID is set:

  • By default, the bandwidth information of the corresponding device ID is returned, and the ID field is displayed as the corresponding device ID.

If the device ID is not set:

  • For the Atlas A3 training product and Atlas A3 inference product, the full device bandwidth information is returned by default in D2H and H2D modes.
  • For other products, the bandwidth information of device 0 is returned by default.
  • To ensure that the Atlas 300I Duo inference card can deliver the optimal bandwidth test result, you are advised to test the bandwidth of device 0.
  • This parameter is not supported in P2P mode.

No

[-ds, --ds, --device-src]

Specifies the ID of the source device for a P2P test. This parameter must be used together with [-dd, --dd, --device-dst]. The values following the parameters must be different. If neither of the two parameters is specified, all Ascend NPUs are tested.

This parameter is not supported by the Atlas 200/300/500 inference products, Atlas 300I Pro inference card, Atlas 300V video analysis card, Atlas 300V Pro video analysis card, Atlas 200I SoC A1 core board, and Atlas 200I/500 A2 inference products.

No

[-dd, --dd, --device-dst]

Specifies the ID of the destination device for a P2P test. This parameter must be used together with [-ds, --ds, --device-src]. The values following the parameters must be different. If neither of the two parameters is specified, all Ascend NPUs are tested.

This parameter is not supported by the Atlas 200/300/500 inference products, Atlas 300I Pro inference card, Atlas 300V video analysis card, Atlas 300V Pro video analysis card, Atlas 200I SoC A1 core board, and Atlas 200I/500 A2 inference products.

No

Note:

  • In this document, the input or output device ID is the logic ID.
  • You can run the npu-smi info -m command to obtain the logic ID by viewing the value of Chip Logic ID on the GUI. The NPU ID is the physical chip ID.

Example

1. Measure the bandwidth and total time consumption without specifying parameters on the Atlas 800I A2 inference server. (If no parameter is specified, information about device 0 in H2D, D2H, and D2D directions will be returned and displayed in step mode).

ascend-dmi --bw -q

  • D2D mode

  • D2H mode

  • H2D mode

2. The following uses the Atlas 800I A2 inference server as an example, where the test is run in fixed-length mode for 100 iterations on device 0 using a 128 data size.
  • H2D mode

    ascend-dmi --bw -t h2d -d 0 --et 100 -s 134217728 -q

  • D2H mode

    ascend-dmi --bw -t d2h -d 0 --et 100 -s 134217728 -q

3. Measure the bandwidth and total time consumption about the data transmitted between the same device on the Atlas 800I A2 inference server.

ascend-dmi --bw -t d2d -d 0 -q

4. Measures the transmission rate and total time consumption from the source device to the destination device.
  • The following uses the Atlas 800I A2 inference server as an example, where data is transmitted to destination device 0 from source device 1 in P2P mode for 100 iterations with a data size of 128 MB.

    ascend-dmi --bw -t p2p --dd 0 --ds 1 --et 100 -s 134217728 -q

  • Perform a P2P test without specifying the source and destination devices on the Atlas 800I A2 inference server.

    ascend-dmi --bw -t p2p -q

5. Perform a P2P test on the Atlas 900 A3 SuperPoD without specifying the source and destination cards.

ascend-dmi --bw -t p2p -q --mode card

If the following information is displayed, the tool is running properly.

[root@****]ascend-dmi --bw -t p2p -m card -q
Unidirectional Peer to Peer Test Bandwidth Matrix(GB/s)
   C\C       0        1        2        3        4        5        6        7
   0         ***      328.96   328.98   329.02   329.08   329.04   329.17   328.99
   1         328.61   ***      328.58   328.56   328.48   328.55   328.53   328.57
   2         328.57   328.49   ***      328.76   328.54   328.54   328.49   328.54
   3         328.52   328.46   328.55   ***      328.72   328.50   328.54   328.54
   4         329.02   329.05   328.99   329.03   ***      329.05   329.04   329.00
   5         328.70   328.58   328.51   328.57   328.59   ***      328.56   328.56
   6         328.56   328.53   328.63   328.58   328.61   328.57   ***      328.55
   7         328.95   328.49   328.56   328.61   328.55   328.56   328.54   ***   

Bidirectional Peer to Peer Test Bandwidth Matrix(GB/s)
   C\C       0        1        2        3        4        5        6        7
   0         ***      540.51   540.39   540.50   541.80   541.90   541.05   540.34
   1         540.90   ***      540.90   541.05   541.48   540.53   559.08   540.56
   2         540.95   541.40   ***      540.61   540.45   540.76   540.80   541.78
   3         540.97   540.87   541.61   ***      541.41   540.35   540.90   540.98
   4         541.30   541.04   540.82   542.88   ***      540.40   541.13   540.68
   5         540.68   541.14   541.86   540.80   540.44   ***      540.80   540.36
   6         540.54   540.91   540.98   541.03   540.63   541.20   ***      541.27
   7         540.51   542.78   540.91   541.69   540.22   540.95   541.02   ***   
The following table describes the parameters in the preceding examples.
Table 2 Parameter description

Parameter

Description

Host to Device Test

Data flow direction. The value can be:
  • Host to Device Test
  • Device to Host Test
  • Device to Device Test
  • Unidirectional Peer to Peer Test
  • Bidirectional Peer to Peer Test

Device X : Ascend XXX

Device X indicates the ID of the current device, and Ascend XXX indicates the processor type.

The value 0 indicates the source device, and the value 1 indicates the destination device.

ID

0 indicates the bandwidth of device 0 in D2D, D2H, and H2D modes.

0→1 indicates unidirectional P2P bandwidth from device 0 to device 1.

0→1 indicates bidirectional P2P bandwidth between device 0 and device 1.

Size(Bytes)

Size of data to be transmitted, in byte.

Execute Times

Number of iterations

Bandwidth(GB/s)

Chip bandwidth

Elapsed Time(us)

Execution duration

FAQs