Inconsistent Configuration Items

Case: After a speech recognition model is migrated from GPUs to NPUs for training, the downstream metric WER varies greatly.

Locating method: Compare the training configurations of the NPU and benchmark based on the startup script or training logs.

The FSDP configuration is used for the NPU, and the DDP configuration is used for the GPU. Training losses remain similar, but downstream metrics show significant differences, as illustrated in the figure below.

Figure 1 NPU and GPU configurations

Solution: Synchronize the GPU configuration.

Result: The WER decreases after the repair and matches the GPU's performance.