Overview
This section describes AOE-related concepts, architectures, and tuning processes.
The Ascend-CANN-Toolkit package is typically not installed on the Atlas 200I A2 accelerator module (RC), Atlas 500 A2 edge station, and Atlas 200I SoC A1 core board. However, if AOE-based remote tuning is needed, this package must be installed, and the installation requires at least 10 GB of hard drive space.
AOE
Ascend Optimization Engine (AOE) is an automatic tuning tool that makes full use of limited hardware resources to meet the performance requirements of operators and the entire network.
It continuously iterates tiling policies through a closed-loop feedback mechanism of policy generation, compilation, and verification in the operating environment, and finally obtains the optimal one. In this way, it helps you to utilize hardware resources to full capacity and achieve the optimal network performance.

- Application layer: tuning entry, which supports the AOE process.
- Tuning layer: tuning mode. The following modes are supported:
- Subgraph tuning: Subgraph Auto Tuning (SGAT) can be used to tune the subgraph segmentation policy, verify the performance in the operating environment, and solidify the optimal tiling policy into the model repository to obtain the tuned model.
- Operator tuning: Operator Auto Tuning (OPAT) can be used to tune operators, verify the performance in the operating environment, and solidify the optimal operator tiling policy into the operator repository.
- Execute layer: This layer supports compilation (Compiler) and running (Runner) in the operating environment.
You are advised to perform subgraph tuning and then operator tuning. The reason is that performing subgraph tuning first can generate the graph partition mode. After subgraph tuning is complete, the operators are partitioned into the final shapes. Operator tuning can then be performed based on the final shapes. If operator tuning is performed first, the shapes of the tuned operators are not the final shapes after operator partitioning. This does not meet the actual application scenarios.
SGAT
SGAT is an optimizer that improves subgraph performance. A complete network can be partitioned into multiple subgraphs. SGAT can be used to generate different tiling policies for those subgraphs. It obtains the profile data of each tiling policy iteration to find the optimal tiling policy. In this way, the optimal performance of the corresponding subgraph can be achieved. The tuning result is saved in the form of a subgraph repository.
SGAT supports resumption from breakpoints. When the tuning becomes abnormal, it can be resumed from the breakpoint.
Figure 2 shows the subgraph tuning process.
OPAT
OPAT is an optimizer that improves operator performance. Through AOE, a complete subgraph is input to the OPAT, which performs operator fusion to obtain subgraphs of multiple fusion operators. Different operator tiling policies are generated for the fusion operator subgraphs to achieve optimal operator performance. The obtained tiling policy is saved in the operator repository.
The current version of AOE supports only the auto tuning of AI Core operators whose compute logic is implemented using DSL APIs. For details about the supported operators, see Operator List.
Figure 3 shows the operator tuning process.

