Locating Methods of Operator Performance Problems
Operator performance is a key challenge in deep learning models. Specifically, the execution efficiency of some basic compute units is low, which affects the overall model runtime and leads to resource waste. Such issues need to be addressed using dedicated analysis tools and code tuning techniques. For example, when evaluating the performance of a fused operator, you can compare metrics such as the computing time and memory usage under different configurations.
|
Analysis Method |
Analysis Purpose |
Troubleshooting Process |
|---|---|---|
|
Advisor analysis |
Long execution time of the AI CPU operator Reducing the time consumed by the AI CPU operator |
Locate the AI CPU operator on the Timeline page based on the operator name, find the operator in the code based on the call stack, and try to replace it with the same-logic operator. If the replacement fails, record the operator shape and type and contact the operator owner to check whether the case is supported. |
|
Operator build error |
You can add the code before Python training to specify the binary mode. If the error persists, record the operator shape and type and contact the operator owner to check whether the case is supported.
torch_npu.npu.set_compile_mode(jit_compile=False) torch_npu.npu.config.allow_internal_format = False |
|
|
Single-operator analysis |
Vector operator analysis |
|
|
Cube operator analysis |
View the operator proportion on the Operator tab page (for details, see In-depth Analysis for Model Tuning (MindStudio Insight)), select the top N operators with the highest time consumption, analyze the average AI Core performance under the input shape, record the abnormal operators and shapes, and contact the operator owner to confirm the tuning plan.
If the operator performance cannot meet the expectation, perform the following steps:
|
|
|
Fusion operator/Affinity API replacement |
Use fusion operators or affinity APIs to reduce the delivery of unnecessary small operators and improve the AI Core utilization. |
The Affinity API issues analyzer in Advisor can automatically identify fused operators. You can locate code based on the call stack and use fused operators or affinity APIs. |
|
Fused operator development |
To further improve the model performance, you can develop fused operators to reduce the delivery of small operators and the proportion of free time. |
The host bottleneck and MTE bottleneck are displayed and marked in the operator sequence analysis results of the advisor CSV deliverable. You are advised to analyze the code logic and determine whether the bottleneck can be alleviated by means of operator combination. |