TBE Operator Performance Tuning
TBE DSL Operators
If the performance of a DSL operator does not meet your requirements, optimize the operator as follows:
- You can use the AOE tool for operator tuning by referring to AOE Instructions.
- If the tuning does not work, use the techniques for implementing the DSL operator for further optimization. For details, see DSL Performance Optimization.
TBE TIK Operators
The following figure shows the workflow of tuning TBE TIK operator performance.

- Check whether the target operator is among the operators supported by AOE. If yes, use AOE to tune the operator.
- After the tuning, analyze the multi-core solution of the operator to check whether it is proper and whether double buffering is enabled. For details, see AI Core Parallelism and Double Buffering.
- Use MindStudio to perform UT on the operator, use the UT performance simulation tool to display the scheduling pipeline of the operator, and perform detailed profiling.
- MTE instruction pipeline analysis
If the ratio of the executed streams of MTE1–MTE3 instructions to that over the entire clock cycle exceeds 80%, the DMA transfer performance is poor.
If MTE instruction streams are discontinuous in execution, the degree of parallelism during data transfer is low.
For details about the optimization, see Data Tiling for Computation.
- Vector instruction pipeline analysis
If Vector instruction streams are discontinuous in execution, the Vector Unit is used to full capacity. In this case, inspect the usage of the synchronization instruction and the degree of instruction parallelism.
If the ratio of the executed streams of Vector instructions to that over the entire clock cycle exceeds 80%, the Vector Unit is fully used. If you want to further improve the performance, try to achieve the optimal instruction parallelism degree and algorithms.
For details, see Data Tiling for Computation and Synchronization Instruction Analysis.
- MTE instruction pipeline analysis