Locating and Analysis Issues Using MindStudio Insight
MindStudio Insight loads all data for fault locating.
- The communication time of each card in this communicator takes relatively high proportion. The total computation time (pure computation time + overlapped communication time) only accounts for one-third of the total time, so it can be identified as a communication issue.
Figure 1 Summary page
- Switch to the communication page: A large amount of card synchronization issues were found (highlighted in red in the box), which indicates that many operators are waiting for extended periods. The most obvious slow card (card 12) is selected for analyzing the detailed cause.
Figure 2 Communication page
- Switch to the pipeline page: It is clear that card 12 has a lot of free time. At the same time, there are many events on the AscendCL side that are occupying resources. Based on experience, this is likely caused by excessive memory usage on this card. When new data is requested, memory reorganization is required, leading to extended idle periods. We can use export PYTORCH_NPU_ALLOC_CONF="expandable_segments:True" to resolve the memory fragmentation issue and improve memory utilization. Solve the performance problem after debugging.
Figure 3 Pipeline page
Figure 4 Pipeline page
Parent topic: Performance Tuning Cases