Eye Diagram Test
Ascend DMI supports the eye diagram test on the network to query the current signal quality.
This function can query the specific data of signal quality. To check the signal quality of the current port, diagnose signal quality. For details, see Eye Diagram Diagnosis.
Function
Query the signal quality of the PCIe, HCCS, and RoCE communication ports on the NPU.
Parameters
You can run either of the following commands to view the available parameters of the signal quality query command:
ascend-dmi --sq -h
ascend-dmi --sq --help
Table 1 lists only a test-specific parameter. For details about other common parameters, see Common Parameters.
Parameter |
Description |
Mandatory |
|---|---|---|
[-sq, --sq, --signal-quality] |
Queries the PCIe, HCCS, and RoCE communication ports on the NPU and the signal quality of the HCCS communication ports on the CPU. |
Yes |
[-d, --device] |
Specifies the device ID of the NPU or CPU to be queried. If multiple processors are specified, use commas (,) to separate them. If this parameter is not specified, all NPUs or CPUs on the device are queried by default.
|
No |
[-t, --type] |
Specifies the type of the communication port. Currently, PCIe, HCCS, and RoCE are supported. Use commas (,) to separate multiple communication port types.
The options are as follows:
|
No |
[-m, --module] |
Queries the CPU and NPU eye diagrams. If this parameter is not specified, the NPU eye diagram is queried by default. This parameter is supported only by the The options are as follows:
|
No |
[-r, --result] |
Specifies the output path for the full eye diagram results, for example, /test. The specified path must meet security requirements and cannot contain the wildcard (*).
|
No |
Example
The command output on an inference server is similar to that on a training server. The following uses the screenshots on a training server as an example.
- PCIe, HCCS, and RoCE signal quality of device 0 and device 1
ascend-dmi --sq -t hccs,pcie,roce -d 0,1
If information shown in Figure 1 is displayed, the tool is running properly.
- Output in JSON format
ascend-dmi --signal-quality -t roce -d 0 --fmt json
If information shown in Figure 2 is displayed, the tool is running properly.The following table describes the parameters displayed in Figure 1.
Table 2 Parameters of the HCCS signal quality detection Parameter
Description
type
Type of the communication port
device
Logic ID of an NPU
M* (macro port)
Macro port. For example, M0 and M1 indicate macro ports 0 and 1 respectively.
L* (LANE)
Lane number in an HCCS link. For example, L0 and L1 indicate lane 0 and lane 1, respectively.
S (SNR)
SNR of a lane
H (HEH)
Half-eye height of a lane
B/T/L/R
Values of the bottom, top, left, and right positions of the quad-eye diagram
Description:
- In the HCCS signal quality command output, if SNR ≥ 400000 and HEH ≥ 350, the lane signal quality is normal.
- If the SNR and HEH values are not within the preceding ranges, the HCCS signal quality is abnormal. In this case, check whether the macro connector is loose and whether the link is dirty.
- If the values of SNR and HEH are 0, no HCCS link is established between the specified devices.
- In the command outputs of the NPU's HCCS signal quality on the Atlas 300I Duo inference card or the CPU's HCCS signal quality on the Atlas 900 A3 SuperPoD, Atlas 800I A3 SuperPoD Server, and Atlas 9000 A3 SuperPoD, only the type, device, M* (macro port), L* (LANE) and B/T/L/R (B(bottom) ≤ -30, T(top) ≥ 30, L(left )≤ -5, R(right) ≥ 5) are displayed.
Table 3 Parameters of the PCIe signal quality detection Parameter
Description
type
Type of the communication port
device
Logic ID of an NPU
M* (macro port)
macro port number. For example, M9 and M10 indicate macro ports 9 and 10, respectively.
L* (LANE)
Lane number in a PCIe link. For example, L0 and L1 indicate lane 0 and lane 1, respectively.
B/T/L/R
Values of the bottom, top, left, and right positions of the quad-eye diagram
Description:
- In the PCIe signal quality command output, if B(bottom) ≤ -17, T(top) ≥ 17, L(left) ≤ -3, and R(right) ≥ 3 (all values must meet the requirements), the lane signal quality is normal.
- If the values of the quad-eye diagram are not within the preceding range, the PCIe signal quality is abnormal. In this case, check whether the macro connector is loose and whether the link is dirty.
- For the Atlas 300I Duo inference card, the value of B/T/L/R meets the following requirements: B(bottom) ≤ -30, T(top) ≥ 30, L(left) ≤ -5, and R(right) ≥ 5.
Table 4 Parameters of the RoCE signal quality detection Parameter
Description
type
Specifies the type of the communication port.
device
Logic ID of an NPU
M* (macro port)
macro port number. For example, M0 indicates macro port 0.
S (SNR)
SNR of a lane
H (HEH)
Half-eye height of a lane
L* (LANE)
Lane number in a RoCE link. For example, L0 and L1 indicate lane 0 and lane 1, respectively.
Description:
- In the RoCE signal quality output:
- For 100G optical modules with SNR ≥ 260000 and HEH ≥ 350, the lane signal quality is normal.
- For 200G optical modules with SNR ≥ 400000 and HEH ≥ 350, the lane signal quality is normal.
- If the SNR and HEH values are not within the preceding ranges, the RoCE signal quality is abnormal. In this case, check whether the macro connector is loose and whether the link is dirty.
- If the values of SNR and HEH are 0, no RoCE link is established between the specified devices.
Example of the command output when the SNR and HEH values are 0[root@*****~]# ascend-dmi --sq -t roce type: roce Prompt message: M*: macro port, L*: lane, S: SNR, H: HEH 100G Optical Normal range: S(SNR) >= 260000 and H(HEH) >= 350 200G Optical Normal range: S(SNR) >= 400000 and H(HEH) >= 350 ---------------------------------------------------------------------------------------------- device signal-to-noise ratio ---------------------------------------------------------------------------------------------- 0 M0: L0: S:0 H:0 L1: S:0 H:0 L2: S:0 H:0 L3: S:0 H:0 ---------------------------------------------------------------------------------------------- 1 M0: L0: S:0 H:0 L1: S:0 H:0 L2: S:0 H:0 L3: S:0 H:0 ---------------------------------------------------------------------------------------------- 2 M0: L0: S:0 H:0 L1: S:0 H:0 L2: S:0 H:0 L3: S:0 H:0 ---------------------------------------------------------------------------------------------- 3 M0: L0: S:0 H:0 L1: S:0 H:0 L2: S:0 H:0 L3: S:0 H:0 ---------------------------------------------------------------------------------------------- 4 M0: L0: S:0 H:0 L1: S:0 H:0 L2: S:0 H:0 L3: S:0 H:0 ---------------------------------------------------------------------------------------------- 5 M0: L0: S:0 H:0 L1: S:0 H:0 L2: S:0 H:0 L3: S:0 H:0 ---------------------------------------------------------------------------------------------- 6 M0: L0: S:0 H:0 L1: S:0 H:0 L2: S:0 H:0 L3: S:0 H:0 ---------------------------------------------------------------------------------------------- 7 M0: L0: S:0 H:0 L1: S:0 H:0 L2: S:0 H:0 L3: S:0 H:0 ---------------------------------------------------------------------------------------------- - PCIe signal quality on device 0 with the error full eye diagram result
ascend-dmi -sq -t pcie -d 0 -r /home/
If information shown in Figure 3 is displayed, the tool is running properly.
The full eye diagram result is saved as a CSV file. It is advised to generate a scatter chart based on the data in the file and observe the distribution of sampling points. Figure 4 shows a normal full eye diagram, and Figure 5 shows an abnormal one.




