Troubleshooting Process

Figure 1 illustrates the troubleshooting process of LLM inference accuracy issues.

Figure 1 Procedure for troubleshooting accuracy issues

To troubleshoot accuracy issues caused by configuration errors, you can check the model configuration, model structure, parameter passing, and the implementation of custom operators.
If the configuration is correct, select the bad case with obvious accuracy issues for further analysis and diagnosis.
Enable deterministic computation and collect the model's output logits.
Compare the logits with the benchmark data.
- If the comparison results are consistent, check for "post-processing" sampling issues and further determine whether the problem lies with the model itself or the "post-processing" step.
- If the comparison results are inconsistent, collect the intermediate computation results where the model output is abnormal, and compare them layer by layer to analyze the source of the error.