Analysis Sample of Timeline-based Tuning of AI CPU Operators

Background

An AI CPU is the compute unit of the Ascend AI Processor. Due to its own bottlenecks, operators running on AI CPUs affect the model execution time. Therefore, tuning of AI CPU operators needs special focus. This section uses the MLP model as an example to describe how to use the AI CPU identification function of Advisor to automatically identify AI CPU operators that are executed in serial mode and provide tuning suggestions to improve the overall performance of the model.

MindStudio Advisor Operations

  1. Click New Project in the upper left corner of the Advisor function page. The Advisor system configuration page is displayed.
    Figure 1 OM only
  2. Set related parameters according to Figure 1 and click Start.
  3. After the analysis is complete, the system displays the analysis result. The following figure shows the output AI CPU operators in the MLP model.
    Figure 2 Summary in the analysis result (Model Graph Optimization)

Fault Analysis

According to the recommendation result of AI CPU operator identification by MindSutdio Advisor, the Cast and Equal operators in the computational graph are executed on the AI CPU. They need to be eliminated from the AI CPU.

The Cast operator is used to convert the format of the operator output result, and AI Core computing supports only 32-bit or lower data types. According to Figure 3, the data type of the lower-layer Equal operator must be greater than 32 bits, therefore, the Cast operator responsible for data format conversion is generated, and the two operators are transferred to the AI CPU.

Figure 3 Structure of the MLP network model (partial)

Troubleshooting

According to the tuning suggestion 3 in Figure 2, change the model structure, for example, from INT64 to FP16.

You can change the data type of the Equal operator to be the same as that of the upper-layer ArgMaxD operator. In this way, the Cast operator is eliminated and the Equal operator is transferred back to the AI Core for calculation.

After the Equal operator is modified, the Advisor analysis shows that the Cast and Equal operators are no longer in the tuning analysis result of the AI CPU operators. In addition, the UB fusion recommendation shows that the ArgMaxD and Equal operators can be fused to further improve the efficiency. For details about the analysis methods, see Analysis Example of UB Operator Fusion Recommendation.

Conclusion

You can use Advisor to quickly find AI CPU operators, and analyze problems and find solutions based on the actual characteristics of network models by using the suggestions provided by MindStudio Advisor. This improves the efficiency of analyzing network performance problems.