Precise Divergence Point Analysis for Fast and Slow Cards
You can quickly locate the root cause of the difference between fast and slow cards on the Timeline tab page of MindStudio Insight.
Locating the Divergence Point
On the Communication tab, open the communication operator thumbnail and select the last iteration ID. Identify the collective communication operator with the largest duration difference. Starting from the divergence point where the impact is most significant, right-click to navigate to the corresponding communication position for further analysis. The green operator is used as an example, as shown in Figure 2.
Locating the Range
- Zoom in the communication operator. Locate the cards to be compared (cards with the largest duration gap, for example, cards 0 and 3), as shown in Figure 3.
- Right-click the communication operator and choose Find in Timeline from the shortcut menu. It is recommended that you add a flag at the beginning of the Python API delivered by the HCCL communication operator on the two cards.
Figure 4 Locating the range 2
- Identify the place (white line in the figure) where the fast and slow cards begin to diverge—this marks the start of the comparison region. Use the respective flags of the two cards as the end points of their comparison intervals to determine the fast-card region and the slow-card region, as shown in Figure 5.
Finding the Differences
Compare the fast-card region and the slow-card region to find the specific causes of the difference.
As shown in Figure 6, there are three areas that cause slow and fast cards: areas 1 and 4, 2 and 5, and 3 and 6.
The problem points of the fast and slow cards have been located. Next, locate the root cause of the fast and slow cards based on the operator and code.



