Performance Tuning Process

If the performance of the network ported to the Ascend AI Processor for training is not satisfactory, you can perform the following steps to tune the performance:

Figure 1 Performance tuning process of TensorFlow network
  1. If the performance is not satisfactory, you are advised to perform the following common operations to improve it:
    1. Enable the automatic mixed precision mode.
    2. Replacing the GELU Activation Function
    3. Use the AOE tool to tune subgraphs, operators, and gradient splitting strategies.

    For details, see Basic Tuning.

  2. Perform model training again and evaluate whether the training performance is satisfactory.
    • If the performance is satisfactory, the tuning is complete.
    • If the performance is not satisfactory, go to 3.
  3. Use the Profiling tool to collect and analyze profile data.

    Refer to Profile Data Collection and Analysis to collect, parse, export, and analyze profile data.

  4. Refer to Advanced Tuning to further improve the performance based on the identified performance bottleneck.
  5. Perform model training again, conduct a regression test, and evaluate whether the training performance is satisfactory.
    • If the performance is satisfactory, the tuning is complete.
    • If the performance is not satisfactory, execute Automatic AOE Tuning again.