Case Description
A certain multimodal model experienced a sudden significant performance degradation during training. We will perform performance tuning based on the process described above.
Use Ascend PyTorch Profiler to collect profile data during LLM training. This case involves a cluster with 16 cards.
The data in this case is analyzed based on the tools and data of earlier versions and is used only as a guide. If the data and tool output are different from those of the latest version, use the output of the latest version.
Parent topic: Performance Tuning Cases