Performance Troubleshooting Process

The basic performance tuning process for an LLM is as follows:

Figure 1 Basic performance tuning process

The first step of performance tuning is to identify the problem and then apply targeted optimizations.

Collect profile data. You can use the Ascend PyTorch Profiler interfaces for data profiling and analysis.
Use MindStudio Insight, the visualization tool, to demarcate the performance issues. The results are typically categorized into three areas: computation, scheduling, and communication.
Use advisor to locate issues. Advisor automatically analyzes profile data using a built-in case library and provides performance tuning suggestions.
You can apply appropriate tuning methods for specific issues. After each tuning, re-run the training, collect profile data, and use MindStudio Insight to check whether the tuning methods take effect. Repeat this process until the performance issues are resolved.

Parent topic: Troubleshooting