Using Training Startup Parameters Consistent with Single CPU Training
Tweak the parameters related to distributed training in your startup script.
The current version requires you to start training in the single CPU form. That is, use the parameters for starting single CPU training to start training on Ascend AI Processor.
This will not alter the actual setup form of your script.
If your script has the distributed strategy parameter, set it to one_device (corresponding to OneDeviceStrategy). Set the number of GPUs (if configurable) to 0.
Using the parameters for starting single CPU training to start NPU training can greatly improve the success rate of distributed porting for the following reasons:
- The training process is streamlined to the TF Adapter's perspective.
- The interference of the default distributed strategy of the original script is shielded.
Parent topic: Manual Porting