Scenarios with Common Performance Issues

During the process of porting an LLM from other devices to Ascend devices and training it on the Ascend device, performance issues may arise. Performance issues are mainly reflected in two aspects: insufficient out-of-the-box performance and performance deterioration after long-term running.

  • Out-of-the-box performance optimization: This refers to users noticing poor performance when using the model on a GPU platform and directly optimizing the performance.
  • Performance deterioration after long-term running: During training, certain performance deterioration issues occur due to unpredictable factors (for example, improper algorithm parameter adjustment). In this case, you need to locate and rectify the fault.
Figure 1 Scenario