[P2P Bandwidth] Low Cross-Ring PIX Bandwidth Due to ACS Triggered by IOMMU

Symptom

The bandwidth performance of the Atlas 200T A2 Box16/Atlas 200I A2 Box16 heterogeneous subrack PIX link (PCIe switch: NPU1 -> PCIe -> PCIe switch -> PCIe -> NPU2) is lower than expected, and the unidirectional bandwidth is almost the same as the bidirectional bandwidth.

Possible Causes

  • Cause 1: PCIe ACS is enabled. As a result, the bandwidth performance of the PIX link is low.

    The PCIe Access Control Services (ACS ) protocol can determine whether a TLP can be normally routed, blocked, or redirected by setting related control bits. ACS can be applied to RCs, switches, and multi-function devices. For single-function devices supporting SR-IOV, ACS treats them as multi-function devices and activates specific functions. Once ACS is enabled on a PCIe switch, P2P data transmission is disabled and the PCIe switch is forced to send access requests of all addresses to RCs, preventing P2P access risks.

  • Cause 2: IOMMU is enabled and PCIe ACS is forcibly enabled. As a result, the bandwidth performance of the PIX link is low.

    After ACSCtl is disabled, the bandwidth of the PIX link remains low. According to the analysis of the kernel code, the problem may be related to IOMMU. ACS is enabled by calling pci_request_acs in the detect_intel_iommu function. That is, ACS is forcibly enabled when IOMMU is enabled.

Check Method (Cause 1)

Run the lspci -s ${NPU's PCIe BDF number} -vvv | grep ACSCtl command to check whether ACSCtl is disabled. If ACSCtl is disabled, a minus sign is displayed.

Solution (Cause 1)

If the minus sign (-) is not displayed, disable ACSCtl by executing the following script.
for pdev in `lspci -vvv|grep -E "^[a-f]|^[0-9]|ACSCtl"|grep ACSCtl -B1|grep -E "^[a-f]|^[0-9]"|awk '{print $1}'`
do
setpci -s $pdev ECAP_ACS+06.w=0000
done

Check Method (Cause 2)

Run the lspci -vvv command. If IOMMU group information is displayed, IOMMU is enabled.

Solution (Cause 2)

  1. Delete intel_iommu = on and iommu = pt and add intel_iommu = off to the GRUB_CMDLINE_LINUX configuration item in the /etc/default/grub file.

  2. Run the update-grub command, and restart the OS.

  3. (Optional) In the BIOS, set Inter(R) VT for Directed I/O (VT-d) to Disable.

  4. Restart the OS, and check whether IOMMU is disabled.

    Run the dmesg | grep -i dmar command to check that IOMMU is disabled.

    Run the lspci -vvvv command to check that no IOMMU group information exists.

  5. Test the PIX link bandwidth again.