GUI Description
Function
During operator performance tuning, MindStudio Insight displays the detailed execution status of bottom-layer instructions in the operator running process on the timeline. The tool also displays the instruction call sequence and time consumption of each pipe on each core of the AI Processor. By analyzing the timeline, you can quickly locate performance bottlenecks by viewing information such as instruction details and instruction duration.
GUI Display
- Area 1: toolbar, which contains common shortcut keys. From left to right, the shortcut keys are Marker List, Filter (card or unit), Search, Flow Events, Reset (page restoration), Timeline Zoom Out, and Timeline Zoom In.
- Area 2: graphical display. The left pane displays the layer information of each core. The first layer is Core, and the second layer is Pipe. The timeline view is displayed on the right by line, including the execution sequence and duration of each instruction. For details about the units, see Table 1.
- Area 3: data pane, which displays statistics or instruction details. If you select Slice Detail, the details of a single instruction are displayed. If you select Slice List, the instruction list information of the selected area in the unit is displayed.
Table 1 Unit information Unit
Description
ALL
Instructions in this channel will be executed in all channels.
SCALAR
Scalar unit.
FLOWCTRL
Control flow instruction.
MTE1
Data transfer pipeline, from L1 to {L0A/L0B, UBUF}.
CUBE
Cube unit.
FIXP
Data transfer pipeline, from FixPipe L0C to OUT/L1.
Only the exported profile data of Atlas A2 Training Series Product/Atlas 800I A2 Inference Product can be displayed.
MTE2
Data transfer pipeline, from {DDR/GM, L2} to {L1, L0A/B, UBUF}.
VECTOR
Vector unit.
MTE3
Data transfer pipeline, from UBUF to {DDR/GM, L2, L1}, or from L1 to {DDR/L2}.
CACHEMISS
Missed iCache.
USEMASK
Custom dotting range.
MTE Throughput
Memory throughput information.
- GM_TO_L1: GM-to-L1 data transfer throughput
- GM_TO_TOTAL: total GM output data throughput
- GM_TO_UB: GM-to-UB data transfer throughput
- L1_TO_GM: L1-to-GM data transfer throughput
- TOTAL_TO_GM: total GM input data throughput
- UB_TO_GM: UB-to-GM data transfer throughput
By examining the duration and intervals at each layer of the timeline view, you can identify potential performance bottlenecks in the corresponding instructions and pipelines. This includes determining if there are bottlenecks in instruction execution or if certain instructions are particularly time-consuming.
