SuperPoD Bandwidth Test Failed In NPU Disconnection Scenarios
Symptom
The following error information is reported when the card-level SuperPoD bandwidth test is performed:
ascend-dmi -bw -t p2p --sp 1 --ip xx.xx.xx.xx --spp /home/superpod/share --hip xx.xx.xx.xx --mode card -q -d 5 ascend-dmi -bw -t p2p --sp 0 --ip xx.xx.xx.xx --spp /home/superpod/share --hip xx.xx.xx.xx --mode card -q -d 5
The figure below shows the log error information.

Run the npu-smi info command to query the environment status. As shown in the following figure, NPU 0 in the environment is disconnected.

Possible Causes
The code uses the logic ID (device ID) to test the card-level bandwidth. When an NPU is disconnected, the device IDs at both ends do not match and the peer file cannot be found.
Solution
Run the npu-smi info -m command to query the chip ID and specify the NPU ID with the same chip logic ID to test the card-level bandwidth.


Example:
ascend-dmi -bw -t p2p --sp 1 --ip xx.xx.xx.xx --spp /home/superpod/share --hip xx.xx.xx.xx --mode card -q -d 6 ascend-dmi -bw -t p2p --sp 0 --ip xx.xx.xx.xx --spp /home/superpod/share --hip xx.xx.xx.xx --mode card -q -d 5
Parent topic: SuperPoD P2P Bandwidth Test