Inconsistent Data Read

Case: A large language model is migrated from the LlamaFactory NPU (benchmark) to the MindSpeed LLM NPU for training. However, the loss values did not match, as shown in the following figure.

Figure 1 Loss mismatch

Locating method: Print the input tokens. The specific location needs to be determined based on the training code (for example, print MindSpeed LLM using the forward_step function in the MindSpeed-LLM/pretrain_gpt.py file, as shown in the following figure).

Figure 2 Printing using the forward_step function

As shown in the following figure, data at the end of token_id is inconsistent.

Figure 3 Inconsistent data at the end of token_id

Solution: Fix the data preprocessing code to make the input consistent.

Result: The loss is matched.

Figure 4 Loss matched

Parent topic: Cases of Inconsistent Checklists