Overall Guideline
The MindIE inference performance can be optimized from two perspectives: large language model (LLM) inference and serving.
Check whether the performance of the current LLM inference can be optimized.
- If scenarios are covered in the version baseline, compare the performance with that of the version baseline and check the configuration.
- If no scenario is covered in the version baseline or the problem persists after the configuration is checked, perform an LLM inference test with the same input and output.
- If the LLM inference test result does not meet the expectation, optimize the LLM inference performance. If the result meets the expectation, optimize the serving performance.
- Locate performance bottlenecks for serving optimization, as shown in Figure 1.
Parent topic: Solution for the MindIE Inference Performance
