GUI Description

Function

During operator performance tuning, MindStudio Insight displays the detailed execution status of bottom-layer instructions in the operator running process on the timeline. The tool also displays the instruction call sequence and time consumption of each pipe on each core of the AI Processor. By analyzing the timeline, you can quickly locate performance bottlenecks by viewing information such as instruction details and instruction duration.

GUI Display

The Timeline tab page consists of the toolbar (area 1), graphical display (area 2), and data pane (area 3), as shown in Figure 1.

Figure 1 Timeline tab page

Area 1: toolbar, which contains common shortcut keys. From left to right, the shortcut keys are Marker List, Filter (card or unit), Search, Flow Events, Reset (page restoration), Timeline Zoom Out, and Timeline Zoom In.
Area 2: graphical display. The left pane displays the layer information of each core. The first layer is Core, and the second layer is Pipe. The timeline view is displayed on the right by line, including the execution sequence and duration of each instruction. For details about the units, see Table 1.

Area 3: data pane, which displays statistics or instruction details. If you select Slice Detail, the details of a single instruction are displayed. If you select Slice List, the instruction list information of the selected area in the unit is displayed.

**Table 1** Unit information
Unit	Description
ALL	Instructions in this channel will be executed in all channels.
SCALAR	Scalar unit.
FLOWCTRL	Control flow instruction.
MTE1	Data transfer pipeline, from L1 to {L0A/L0B, UBUF}.
CUBE	Cube unit.
FIXP	Data transfer pipeline, from FixPipe L0C to OUT/L1. Only the exported profile data of Atlas A2 Training Series Product/Atlas 800I A2 Inference Product can be displayed.
MTE2	Data transfer pipeline, from {DDR/GM, L2} to {L1, L0A/B, UBUF}.
VECTOR	Vector unit.
MTE3	Data transfer pipeline, from UBUF to {DDR/GM, L2, L1}, or from L1 to {DDR/L2}.
CACHEMISS	Missed iCache.
USEMASK	Custom dotting range.
MTE Throughput	Memory throughput information. GM_TO_L1: GM-to-L1 data transfer throughput GM_TO_TOTAL: total GM output data throughput GM_TO_UB: GM-to-UB data transfer throughput L1_TO_GM: L1-to-GM data transfer throughput TOTAL_TO_GM: total GM input data throughput UB_TO_GM: UB-to-GM data transfer throughput

By examining the duration and intervals at each layer of the timeline view, you can identify potential performance bottlenecks in the corresponding instructions and pipelines. This includes determining if there are bottlenecks in instruction execution or if certain instructions are particularly time-consuming.

Parent topic: Timeline