Locating and Analysis Issues Using MindStudio Insight

MindStudio Insight loads all data for fault locating.

The communication time of each card in this communicator takes relatively high proportion. The total computation time (pure computation time + overlapped communication time) only accounts for one-third of the total time, so it can be identified as a communication issue.
Figure 1 Summary page
Switch to the communication page: A large amount of card synchronization issues were found (highlighted in red in the box), which indicates that many operators are waiting for extended periods. The most obvious slow card (card 12) is selected for analyzing the detailed cause.
Figure 2 Communication page
Switch to the pipeline page: It is clear that card 12 has a lot of free time. At the same time, there are many events on the AscendCL side that are occupying resources. Based on experience, this is likely caused by excessive memory usage on this card. When new data is requested, memory reorganization is required, leading to extended idle periods. We can use export PYTORCH_NPU_ALLOC_CONF="expandable_segments:True" to resolve the memory fragmentation issue and improve memory utilization. Solve the performance problem after debugging.
Figure 3 Pipeline page

Figure 4 Pipeline page

Parent topic: Performance Tuning Cases