Analysis Sample of Custom Operator Performance Tuning
Background
An operator is the basic unit of a model, and custom operators can be tuned.
We construct a complex sample operator Vadd_sample using vector addition, and use the Profiling tool to verify and analyze its performance. Vadd_sample can be executed only on vectors. Therefore, vec_ratio (ratio of the cycle count of vector instructions to the cycle count of all instructions) in the AI Core Metrics view can be used as the operator performance metric. After checking the vec_ratio metric of the Vadd_sample operator in the analysis result, we find that the value is low, indicating that the operator does not reach the optimal performance. It can be concluded that custom operator Vadd_sample does not reach the optimal performance and can be tuned.
- This section describes only the Profiling tool-related operations and analysis process. The detailed tuning programming of the custom operator is not described here.
- For details about the custom operator, see Operator Development.
Profiling Operations
- Start MindStudio IDE and open a built project.
- Choose from the menu bar. The system analysis project page is displayed.Figure 1 System analysis project page
- On the system analysis project page, click New Project on the welcome page or the
icon in the upper left corner. The profiling configuration window is displayed, as shown in Figure 2. - Access the Executable Properties page. Set the path for storing the executable file of the profiling project. See Figure 3.
- Access the Profiling Options page and select Task-based. See Figure 4.
- After the preceding configurations are complete, click Start in the lower right corner of the window to start Profiling.
The profiling results will be automatically displayed at the bottom of the MindStudio IDE window after the execution is complete. See Figure 5.
Fault Analysis
First, you can view the Timeline to learn about the overall execution status of an application. With the palette function, you can easily mark and find APIs or operators that are time-consuming to preliminarily locate potential bottlenecks based on the API/operator Statistics table. Finally, you can observe the AI Core Metrics of an operator to analyze whether the operator performance is insufficient and how much is the tuning space, and then determine the tuning breakthrough point.
In this example, only one operator is used to process a large amount of data. Therefore, you can directly analyze AI Core Metrics. Vadd_sample is a compute-intensive vector operator. You need to check whether vec_ratio (the ratio of the number of cycles of vector instructions to the number of cycles of all instructions, that is, resource usage) meets the requirement. See Figure 6.
In the AI Core Metrics view, the maximum value of vec_ratio of Vadd_sample is 0.639. The closer the value of vec_ratio is to 1, the higher the resource usage. Therefore, the Vadd_sample operator can be tuned.
The task of the Profiling tool is complete.
Troubleshooting
To tune the Vadd_sample operator, you are advised to use such tuning methods as dual-core parallel and double buffer based on the TIK Tuning Guide to improve the vector computing resource usage of the Vadd_sample operator and reduce the operator execution time.
Developers need to perform programming operations by themselves.
After tuning the Vadd_sample operator, execute Profiling (Profiling Operations) on the inference application again to obtain the new result. Check the AI Core Metrics view again, as shown in Figure 7.
After the tuning, the average value of vec_ratio in the AI Core Metrics view reaches 0.83, indicating that the vector computing resource usage of Vadd_sample is improved.
Conclusion
By using the Profiling tool to analyze the resource usage of network application inference twice and compare the operator tuning results, it can be concluded that custom operator Vadd_sample does not reach the optimal performance and can be tuned.





