[H2D/D2H/P2P Bandwidth] Bandwidth Degradation Due to Lane Downgrade

Symptom

The bandwidth test performance does not meet the requirement, and the measured performance is only 3/4, 1/2, or 1/4 of the expected value.

Possible Causes

The number of lanes is reduced.

Check Method

1. Method for checking lane downgrade in HCCS

Method 1: Run the npu-smi info -t health -i ${NPU chip ID} -c ${NPU device ID} command to check whether the 0x819b8605 alarm exists.

Method 2: Run the npu-smi info -t hccs -i ${NPU chip ID} -c ${NPU device ID} command to check whether the value of hccs lane mode is 4.

2. Method for checking lane downgrade in PCIe

Log in to iBMC, collect iBMC logs, and check whether the PCIe bus is downgraded in the dump_info\AppDump\card_manage\card_info file.

3. Method for checking lane downward in HCCS using a bus device

Log in to a bus device and run the display alarm active or display alarm history command to check whether the 0xF10509 alarm exists.

Solution

Contact hardware maintenance engineers.