[P2P Bandwidth] P2P Bandwidth Degradation Between the First and Last 8-NPU Groups

Symptom

When the P2P bandwidth test is performed on the Atlas 200T A2 Box16/Atlas 200I A2 Box16 heterogeneous subrack, the command cannot be executed and an error message is displayed, or the test result does not meet the expectation.

Possible Causes

ACSCtl is enabled in the current environment, which may affect the test result.

Check Method

Run the lspci -s ${NPU BDF number} -vvv command to check whether the ACSCtl status is -.

Solution

  1. Run the following commands to stop ACSCtl:
    for pdev in `lspci -vvv|grep -E "^[a-f]|^[0-9]|ACSCtl"|grep ACSCtl -B1|grep -E "^[a-f]|^[0-9]"|awk '{print $1}'` 
    do
    setpci -s $pdev ECAP_ACS+06.w=0000 
    done
  2. If the preceding commands do not take effect, check whether the kernel boot items contain intel_iommu=on and iommu=pt.

  3. Delete the configurations and restart the OS.