Troubleshooting Process
Figure 1 illustrates the troubleshooting process of LLM inference accuracy issues.
- To troubleshoot accuracy issues caused by configuration errors, you can check the model configuration, model structure, parameter passing, and the implementation of custom operators.
- If the configuration is correct, select the bad case with obvious accuracy issues for further analysis and diagnosis.
- Enable deterministic computation and collect the model's output logits.
- Compare the logits with the benchmark data.
- If the comparison results are consistent, check for "post-processing" sampling issues and further determine whether the problem lies with the model itself or the "post-processing" step.
- If the comparison results are inconsistent, collect the intermediate computation results where the model output is abnormal, and compare them layer by layer to analyze the source of the error.
