Inconsistent Model Structure
Case: After an MoE model is migrated from the GPU to the NPU, the loss is not matched.
Figure 1 Loss mismatch


Locating method: Check the code or print the model structure for comparison.
The code review reveals a discrepancy: the residual layer in the NPU model follows input_layernorm, whereas in the GPU model, it precedes input_layernorm. The sequence structures of the two models are different.
Figure 2 Model structure comparison


Solution: Place input_layernorm in the NPU after residual.
Result: The loss is matched after the model structure is matched.
Figure 3 Loss matched


Parent topic: Cases of Inconsistent Checklists