Communication Analysis
Communication Analysis displays the communication performance of all cards and network-wide link performance in cluster scenarios.
MindStudio IDE does not support data collection in cluster scenarios. You can use Import Result to import the parent directory of PROF_XXX to display the collected profile data.
- Communication Duration Analysis: displays the card communication performance, including the communication duration, waiting duration, and card link information, as shown in Figure 1.
- Communication Matrix: displays information about the three link modes, including the bandwidth, communication duration, and communication size, as shown in Figure 2.
|
Field |
Description |
|---|---|
|
Iteration ID |
Iteration ID, used for querying the iteration data of all operators in a specified iteration. |
|
Operator Name |
Communication operator name, which is used to view the iteration data of a specified operator. |
|
Rank ID |
Rank ID, used for querying all operator iteration data on a specified node.
|
|
Critical Path |
Key path. The profile data in the key task path is filtered for analysis and display. This parameter is enabled by default. |
|
Apply |
Data export button.
|
|
Communication Duration Analysis |
Communication duration analysis. |
|
Guidance |
Guidance. You can view the guidance, for example, to check whether the waiting duration ratio of the rank is greater than the threshold (0.2). |
|
Advisor |
Analysis and suggestion on slow cards/ranks. |
|
Visualized Communication Time |
Visualized communication duration. |
|
Time(ms) |
Duration. |
|
Rank |
Rank. A node in the cluster scenario. |
|
Ratio |
Ratio. The options are Synchronization Time Ratio and Wait Time Ratio. |
|
Data Analysis of Communication Time |
Operator communication duration analysis. |
|
Rank ID |
Rank ID in the cluster scenario. |
|
Elapse Time(ms) |
Total operator communication duration. |
|
Transit Time(ms) |
Communication duration. If the communication duration is too long, a link may be faulty. |
|
Synchronization Time(ms) |
Synchronization duration. It is the duration required for synchronization between cards. |
|
Wait Time(ms) |
Wait duration. Synchronization is performed before two cards communicate with each other. |
|
Synchronization Time Ratio |
Synchronization duration ratio. Synchronization Time Ratio = Synchronization Time/(Synchronization Time + Transit Time). A larger synchronization duration ratio before communication indicates a lower communication efficiency, which may be caused by slow cards. |
|
Wait Time Ratio |
Wait duration ratio of communication operators. Wait Time Ratio = Wait Time/(Wait Time + Transit Time). The larger the wait duration ratio, the longer the card wait duration to the total communication duration, and the lower the communication efficiency. |
|
Idle Time(ms) |
Duration for communication operator delivery. Duration for communication operator delivery (Idle Time) = Total operator communication duration (Elapse Time) – Communication duration (Transit Time) – Wait duration (Wait Time) |
|
Bandwidth Analysis |
Bandwidth analysis. Click see more to view the bandwidth details of a specified operator, as shown in Figure 3. |
|
Communication Operators Details |
Communication operator details. Click see more to view the link details of a specified communication operator, as shown in Figure 4. |
|
Communication Matrix |
Communication matrix. |
|
Suggestions |
Analysis suggestions. Analysis suggestions on network-wide link information are provided based on different link modes (HCCS, PCIe, and RDMA), including the communication duration, bandwidth, traffic, bandwidth usage, and slow links. |
|
Matrix Model |
Matrix model. |
|
Communication Matrix Type |
Communication matrix type.
|
|
Src Rank Id |
Rank ID of the source card in the logical card link information. |
|
Dst Rank Id |
Rank ID of the destination card in the logical card link information. |
|
Field |
Description |
|---|---|
|
Advisor |
Analysis suggestions. |
|
Transport Type |
Link mode. |
|
SDMA |
SDMA link. The options are HCCS and PCIe. |
|
HCCS |
HCCS link. |
|
PCIE |
PCIe link. |
|
RDMA |
RDMA link. |
|
Packet Number |
Number of communication packets. |
|
Packet Size(MB) |
Size of a communication packet. |
|
Transit Size(MB) |
Size of communication packets within one communication process. |
|
Transit Time(ms) |
Duration of one communication process. |
|
Bandwidth(GB/s) |
Bandwidth. The bandwidth equals the traffic divided by the communication duration. |
|
Bandwidth(Utilization) |
Bandwidth usage. If the actual bandwidth is less than 0.8 times the empirical bandwidth, the bandwidth usage is not high and further analysis is required. Empirical bandwidth reference values: RDMA_Bandwidth = 12.5; HCCS_Bandwidth = 18; and PCIe_Bandwidth = 20. |
|
Large Packet Ratio |
Patio of large communication packets. It is the ratio of packets whose sizes are big enough to enable the communication link to reach the empirical bandwidth. |
|
Field |
Description |
|---|---|
|
Operator Name |
Communication operator name. |
|
Elapse Time(ms) |
Total time consumed by all events of the communication operators, in milliseconds. |
|
Transit Time(ms) |
Communication duration, in milliseconds. The communication duration is calculated based on the total time consumed by the communication operators of the SDMA and RDMA links. |
|
Synchronization Time(ms) |
Synchronization duration, in milliseconds. It is the waiting time before the first data transmission. |
|
Wait Time(ms) |
Waiting duration, in milliseconds. Synchronization is performed before two logical cards communicate with each other. |
|
Synchronization Time Ratio |
Synchronization duration ratio. The calculation formula is Synchronization Time/(Synchronization Time + Transit Time). |
|
Wait Time Ratio |
Waiting time ratio. The calculation formula is Wait Time/(Wait Time + Transit Time). |
|
Idle Time(ms) |
Duration for communication operator delivery. Duration for communication operator delivery (Idle Time) = Total operator communication duration (Elapse Time) – Communication duration (Transit Time) – Wait duration (Wait Time) |

