Communication Analysis

Communication Analysis displays the communication performance of all cards and network-wide link performance in cluster scenarios.

MindStudio IDE does not support data collection in cluster scenarios. You can use Import Result to import the parent directory of PROF_XXX to display the collected profile data.

The Communication Analysis page displays data in the Communication Duration Analysis and Communication Matrix two parts.
  • Communication Duration Analysis: displays the card communication performance, including the communication duration, waiting duration, and card link information, as shown in Figure 1.
  • Communication Matrix: displays information about the three link modes, including the bandwidth, communication duration, and communication size, as shown in Figure 2.
Figure 1 Communication Duration Analysis
Figure 2 Communication Matrix
Table 1 describes the fields in the figures.
Table 1 Field description

Field

Description

Iteration ID

Iteration ID, used for querying the iteration data of all operators in a specified iteration.

Operator Name

Communication operator name, which is used to view the iteration data of a specified operator.

Rank ID

Rank ID, used for querying all operator iteration data on a specified node.

  • When Critical Path is enabled, you can select a value from the Rank ID drop-down list.
  • When Critical Path is disabled, the Rank ID drop-down list is dimmed and cannot be selected.

Critical Path

Key path. The profile data in the key task path is filtered for analysis and display. This parameter is enabled by default.

Apply

Data export button.

  • Click Communication Duration Analysis, select the target iteration ID and operator name, and click this button to export the communication duration analysis data of the specified operator in the specified iteration.
  • Click Communication Matrix, select the target iteration ID and operator name, and click this button to export the communication matrix analysis data of the specified operator in the specified iteration.

Communication Duration Analysis

Communication duration analysis.

Guidance

Guidance. You can view the guidance, for example, to check whether the waiting duration ratio of the rank is greater than the threshold (0.2).

Advisor

Analysis and suggestion on slow cards/ranks.

Visualized Communication Time

Visualized communication duration.

Time(ms)

Duration.

Rank

Rank. A node in the cluster scenario.

Ratio

Ratio. The options are Synchronization Time Ratio and Wait Time Ratio.

Data Analysis of Communication Time

Operator communication duration analysis.

Rank ID

Rank ID in the cluster scenario.

Elapse Time(ms)

Total operator communication duration.

Transit Time(ms)

Communication duration. If the communication duration is too long, a link may be faulty.

Synchronization Time(ms)

Synchronization duration. It is the duration required for synchronization between cards.

Wait Time(ms)

Wait duration. Synchronization is performed before two cards communicate with each other.

Synchronization Time Ratio

Synchronization duration ratio.

Synchronization Time Ratio = Synchronization Time/(Synchronization Time + Transit Time). A larger synchronization duration ratio before communication indicates a lower communication efficiency, which may be caused by slow cards.

Wait Time Ratio

Wait duration ratio of communication operators.

Wait Time Ratio = Wait Time/(Wait Time + Transit Time). The larger the wait duration ratio, the longer the card wait duration to the total communication duration, and the lower the communication efficiency.

Idle Time(ms)

Duration for communication operator delivery.

Duration for communication operator delivery (Idle Time) = Total operator communication duration (Elapse Time) – Communication duration (Transit Time) – Wait duration (Wait Time)

Bandwidth Analysis

Bandwidth analysis. Click see more to view the bandwidth details of a specified operator, as shown in Figure 3.

Communication Operators Details

Communication operator details. Click see more to view the link details of a specified communication operator, as shown in Figure 4.

Communication Matrix

Communication matrix.

Suggestions

Analysis suggestions. Analysis suggestions on network-wide link information are provided based on different link modes (HCCS, PCIe, and RDMA), including the communication duration, bandwidth, traffic, bandwidth usage, and slow links.

Matrix Model

Matrix model.

Communication Matrix Type

Communication matrix type.

  • Bandwidth(GB/s): bandwidth.
  • Transit Size(MB): communication size.
  • Transport Type: link type.
  • Large Packet Ratio: ratio of large communication packets.
  • Bandwidth(Utilization): bandwidth usage.
  • Transit Time(ms): communication duration.

Src Rank Id

Rank ID of the source card in the logical card link information.

Dst Rank Id

Rank ID of the destination card in the logical card link information.

Figure 3 Bandwidth Analysis
Table 2 describes the fields in the figure.
Table 2 Field description

Field

Description

Advisor

Analysis suggestions.

Transport Type

Link mode.

SDMA

SDMA link. The options are HCCS and PCIe.

HCCS

HCCS link.

PCIE

PCIe link.

RDMA

RDMA link.

Packet Number

Number of communication packets.

Packet Size(MB)

Size of a communication packet.

Transit Size(MB)

Size of communication packets within one communication process.

Transit Time(ms)

Duration of one communication process.

Bandwidth(GB/s)

Bandwidth. The bandwidth equals the traffic divided by the communication duration.

Bandwidth(Utilization)

Bandwidth usage. If the actual bandwidth is less than 0.8 times the empirical bandwidth, the bandwidth usage is not high and further analysis is required.

Empirical bandwidth reference values: RDMA_Bandwidth = 12.5; HCCS_Bandwidth = 18; and PCIe_Bandwidth = 20.

Large Packet Ratio

Patio of large communication packets. It is the ratio of packets whose sizes are big enough to enable the communication link to reach the empirical bandwidth.

Figure 4 Communication Operators Details
Table 3 describes the fields in the figure.
Table 3 Field description

Field

Description

Operator Name

Communication operator name.

Elapse Time(ms)

Total time consumed by all events of the communication operators, in milliseconds.

Transit Time(ms)

Communication duration, in milliseconds. The communication duration is calculated based on the total time consumed by the communication operators of the SDMA and RDMA links.

Synchronization Time(ms)

Synchronization duration, in milliseconds. It is the waiting time before the first data transmission.

Wait Time(ms)

Waiting duration, in milliseconds. Synchronization is performed before two logical cards communicate with each other.

Synchronization Time Ratio

Synchronization duration ratio. The calculation formula is Synchronization Time/(Synchronization Time + Transit Time).

Wait Time Ratio

Waiting time ratio. The calculation formula is Wait Time/(Wait Time + Transit Time).

Idle Time(ms)

Duration for communication operator delivery.

Duration for communication operator delivery (Idle Time) = Total operator communication duration (Elapse Time) – Communication duration (Transit Time) – Wait duration (Wait Time)