Customized Stream Test

Function

The customized stream test separates steps for one-click stream test, allowing you to flexibly control the TX and RX directions and specify the specific lane for stream testing.

Test Item

Supported Stream Test Mode

Instructions

Customized stream test

Stream test in a CDR loopback, in a fiber optic circulator, or through direct NPU connection

The customized stream test separates steps for one-click stream test, allowing you to flexibly control the TX and RX directions and specify the specific lane for stream test.

One-click stream test

Stream test in a CDR loopback or fiber optic circulator

When the command for one-click stream test is executed, Ascend DMI automatically sends and receives streams of all lanes on the specified device. After a period of time, the streams are closed and the result is queried.

Test Principle

The custom stream test supports the following modes:

  • External NPU loopback
  • Direct NPU connection: After the SerDes port of NPU A initiates traffic generation in the TX direction, data flows reach the SerDes port of NPU B through the link to be tested. In RX direction, NPU B compares the data based on a matching pattern and collects bit errors on the received data to check the signal quality of the link between the two NPUs.
    Figure 1 Stream test in direct NPU connection mode
    • External loopback: Ensure that the TX direction is enabled before the RX direction.
    • Direct NPU connection: When NPU A sends streams and NPU B receives streams, enable the TXA direction and then the TXB direction. When NPU B sends streams and NPU A receives streams, enable the TXB direction and then the RXA direction. Otherwise, the number of bit errors will reach the upper limit.
    • The operations performed on different lanes of the same device during the stream test must be the same. Otherwise, the NPU and CDR re-adaptation will affect the test result. For example, if other lanes are enabled or disabled during the stream test on lane 0, the number of bit errors on lane 0 may reach the maximum.

Preparations

  • The stream test will interrupt training or inference services. Before the test, ensure that no service is running.
  • If an external fiber optic circulator is used or two NPUs are directly connected, no additional configuration is required before the stream test. If a CDR loopback is used, ensure that the optical module can work properly before configuring the CDR loopback.

Parameters

You can run either of the following commands to list the parameters of the stream test command:

ascend-dmi --prbs-check -h

ascend-dmi --prbs-check --help

Table 1 lists only a test-specific parameter. For details about other common parameters, see Common Parameters.

Before the stream test, run the --clear command to clear the historical information stored in the current device register.

Table 1 Parameters

Parameter

Description

Mandatory

[-pc, --pc, --prbs-check]

Performs a PRBS stream test.

Yes

[--prbs-mode]

Specifies whether to switch the stream test status.

-- EN (Enable): enabled

-- DS (Disable): disabled

  • The value is case-sensitive.
  • If --prbs-mode is set to EN or DS, the configuration takes effect in both the signal TX and RX directions, regardless of whether --generator-lanes or --checker-lanes is specified.
  • If --prbs-mode is set to EN, --generator-lanes and --checker-lanes can be specified.
  • If --prbs-mode is set to DS, the stream test stops. --generator-lanes and --checker-lanes cannot be specified.
  • This parameter cannot be specified together with --show or --clear.

Yes

[--generator-lanes]

Specifies the lane of the TX end.

  • You can specify one or more lanes at a time. Use commas (,) to separate multiple lanes. If multiple lanes are specified, they must be consecutive, for example, 0,1,2. Non-consecutive sequence is not supported, for example, 0,1,3.
  • If this parameter is not specified, all lanes are tested by default.
  • This parameter cannot be specified together with --show or --clear.
  • Its value can be 0, 1, 2, or 3.

No

[--checker-lanes]

Specifies the lane of the RX end.

  • You can specify one or more lanes at a time. Use commas (,) to separate multiple lanes. If multiple lanes are specified, they must be consecutive, for example, 0,1,2. Non-consecutive sequence is not supported, for example, 0,1,3.
  • If this parameter is not specified, all lanes are tested by default.
  • This parameter cannot be specified together with --show or --clear.
  • Its value can be 0, 1, 2, or 3.

No

[-show, --show, --show-diagnostic-info]

Displays the stream test result.

  • This parameter cannot be specified together with --clear, --prbs-mode, --generator-lanes, or --checker-lanes.
  • After the information is displayed, the test result is cleared.

No

[-clear, --clear, --clear-diagnostic-info]

Clears the stream test result.

  • This parameter cannot be specified together with --show, --prbs-mode, --generator-lanes, or --checker-lanes.
  • You can specify other parameters excluding the aforementioned parameters at the same time.

No

Example

Figure 2 describes how to perform a customized stream test.

Figure 2 Stream test procedure
  • Default values using as an example
    ascend-dmi -pc --clear  -q    
    ascend-dmi -pc --prbs-mode EN -q

    Description: Start a stream test on all devices. The TX end has four lanes and the code type is PRBS31. The RX end has four lanes and the code type is PRBS31.

    Figure 3 Example of using default values for the stream test
  • Stream test on device 8, with two lanes (0 and 1) at the TX end and four lanes at the RX end
    [***@***]# ascend-dmi -pc --clear --device 8 -q
    Operation succeeded.
    [***@***]# ascend-dmi -pc --prbs-mode EN -q --device 8 --generator-lanes 0,1
    Operation succeeded.
    [***@***]# ascend-dmi --pc --show -d 8 -q
    Device 8:
    -----------------------------------------------------------------------------------------------
    Lane      Check Enable    Pattern    Error-Bits     Bit-Error Rate(BER)    ALOS      Period(ms)
    ----------------------------------------------------------------------------------------------------
    0         1               PRBS31     206            0.0000000032%          0         120193
    1         1               PRBS31     385            0.0000000060%          0         120187
    2         1               PRBS31     67092480       0.0010508065%          0         120186
    3         1               PRBS31     67092480       0.0010507844%          0         120189
    -----------------------------------------------------------------------------------------------
  • Disabling the stream test

    ascend-dmi -pc --prbs-mode DS -d 8,9 -q

    This command disables the stream test in the TX and RX directions of the four lanes on devices 8 and 9.

  • Clearing the stream test result

    ascend-dmi -pc --clear-diagnostic-info -d 8,9 -q

    This command clears the bit error data recorded on devices 8 and 9.

Table 2 describes the parameters in the command outputs.

Table 2 Parameters in the command outputs

Parameter

Description

Lane

Lane ID of the corresponding RoCE link.

Check Enable

Check status of the RX end.

0: disabled

1: enabled

Pattern

Check code pattern in the RX direction.

Error-Bits

Number of bit errors. The upper limit is 67092480.

Bit-Error Rate(BER)

BER = Number of bit errors/Total number of transmitted bits x 100%

If the BER is less than 0.001%, the signal quality is normal.

ALOS

The value must be 0 for a normal stream test. If the value is 1, the signal amplitude is too low.

If no stream test is conducted, ignore it.

Period

Time since the last operation of controlling streams or reading the check result.

The number of bit errors may reach 67092480 in the following situations:

  • The stream test is performed without using --clear to clear the register.
  • The code types specified in the TX and RX directions are inconsistent.
  • The RX direction is enabled before the TX direction.
  • The NPU and CDR adaptation is automatically disabled during the stream test. However, multiple executions of the stream test command can repeatedly enable and disable adaptation. If the adaptation process is not completed in a timely manner, the number of bit errors can reach 67092480 at certain time.
  • In the CDR loopback scenario, no CDR loopback is configured.

Follow-up Procedure

  • To prevent the running training or inference service from being affected, disable the stream test after it is finished.
  • If the CDR loopback is used for the stream test, release the CDR loopback after the test. Otherwise, services cannot run properly.