Performance Troubleshooting Process
The basic performance tuning process for an LLM is as follows:
Figure 1 Basic performance tuning process
The first step of performance tuning is to identify the problem and then apply targeted optimizations.
- Collect profile data. You can use the Ascend PyTorch Profiler interfaces for data profiling and analysis.
- Use MindStudio Insight, the visualization tool, to demarcate the performance issues. The results are typically categorized into three areas: computation, scheduling, and communication.
- Use advisor to locate issues. Advisor automatically analyzes profile data using a built-in case library and provides performance tuning suggestions.
- You can apply appropriate tuning methods for specific issues. After each tuning, re-run the training, collect profile data, and use MindStudio Insight to check whether the tuning methods take effect. Repeat this process until the performance issues are resolved.
Parent topic: Troubleshooting