Performance Tuning Process

If the performance of the network ported to the Ascend AI Processor for training is not satisfactory, you can perform the following steps to tune the performance.

Figure 1 Performance tuning process of TensorFlow network
Click to enlarge

If the performance is not satisfactory, you are advised to perform the following common operations to improve it:
1. Enable the automatic mixed precision mode.
2. Replace the GELU activation function.
3. Use the AOE tool to tune subgraphs, operators, and gradient segmentation policies.
For details, see Basic Tuning.
Perform model training again and evaluate whether the training performance is satisfactory.
- If the performance is satisfactory, the tuning is complete.
- If the performance is not satisfactory, go to 3.
Use the Profiling tool to collect and analyze profile data.

Refer to Profile Data Collection and Analysis to collect, parse, export, and analyze profile data.
Refer to Advanced Tuning to further improve the performance based on the identified performance bottleneck.
Perform model training again, conduct a regression test, and evaluate whether the training performance is satisfactory.
- If the performance is satisfactory, the tuning is complete.
- If the performance is not satisfactory for the following Product, execute operations in Automatic AOE Tuning again.
  Atlas A3 training products / Atlas A3 inference products
  
  Atlas A2 training products / Atlas A2 inference products
  
  Atlas training products

Parent topic: Performance Tuning