P2P Stress Test

Function

Test whether a hardware fault occurs on the HCCS communication link between the specified source device and destination device and output the test result. It is advised that this function be used when the training accuracy is abnormal due to hardware faults of the HCCS communication link.

Table 1 Diagnostic items

Item

Time Required

Whether NPU Training or Inference Is Affected

Application Scenario

P2P stress test

1–5 minutes

Yes

An exception occurred during data copy between devices.

Parameters

Table 2 lists only test-specific parameters. For details about other common parameters, see Common Parameters.

Table 2 Parameters

Parameter

Description

Mandatory

[-i, --items]

Specifies the diagnosis check item.

bandwidth: local bandwidth, including the Host to Device, Device to Host, Device to Device and Peer to Peer directions.

Yes

[-t, --type]

Specifies the type of data flows.

  • This parameter takes effect only when items is set to bandwidth and -s is passed, indicating that the P2P stress test is performed.
  • Currently, only the P2P mode is supported.

    P2P: measures the transmission rate and total time consumption from the source device to the destination device.

Yes

Example

ascend-dmi -dg -i bandwidth --type p2p -s -q

  • Default mode
    1
    2
    3
    4
    5
    6
    7
    8
    9
    [***@***]# ascend-dmi -dg -i bandwidth --type p2p -s -q
    Summary:
        Arch: aarch64
        Mode: ******
        Time: 20250529-19:55:23
     
    Hardware:
        bandwidth:
            PASS
    
  • If an unsupported device is used to perform a P2P stress test, the following information is displayed:
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    [***@***]# ascend-dmi -dg -i bandwidth --type p2p -s -q
    Summary:
        Arch: aarch64
        Mode: ******
        Time: 20250529-19:51:57
     
    Hardware:
        bandwidth:
            SKIP
            *** The current device does not support the p2p stress test.
    

Fault Check Items

Table 3 Fault check items

Command Output

Description

PASS

The stress test is passed, and the result is normal.

SKIP

The product or scenario does not support the P2P stress test.

EMERGENCY_WARN

Emergency warning. The stress test fails. Contact Huawei engineers to replace the hardware.

FAIL

The P2P stress test fails. Contact Huawei technical support.