Inconsistent Data Read
Case: A large language model is migrated from the LlamaFactory NPU (benchmark) to the MindSpeed LLM NPU for training. However, the loss values did not match, as shown in the following figure.
Figure 1 Loss mismatch


Locating method: Print the input tokens. The specific location needs to be determined based on the training code (for example, print MindSpeed LLM using the forward_step function in the MindSpeed-LLM/pretrain_gpt.py file, as shown in the following figure).
Figure 2 Printing using the forward_step function


As shown in the following figure, data at the end of token_id is inconsistent.
Figure 3 Inconsistent data at the end of token_id


Solution: Fix the data preprocessing code to make the input consistent.
Result: The loss is matched.
Figure 4 Loss matched


Parent topic: Cases of Inconsistent Checklists