GUI Description

Function

During operator performance tuning, MindStudio Insight displays the detailed execution status of bottom-layer instructions in the operator running process on the timeline. The tool also displays the instruction call sequence and time consumption of each pipe on each core of the AI Processor. By analyzing the timeline, you can quickly locate performance bottlenecks by viewing information such as instruction details and instruction duration.

GUI Display

The Timeline tab page consists of the toolbar (area 1), graphical display (area 2), and data pane (area 3), as shown in Figure 1.
Figure 1 Timeline tab page
  • Area 1: toolbar, which contains common shortcut keys. From left to right, the shortcut keys are Marker List, Filter (card or unit), Search, Flow Events, Reset (page restoration), Timeline Zoom Out, and Timeline Zoom In.
  • Area 2: graphical display. The left pane displays the layer information of each core. The first layer is Core, and the second layer is Pipe. The timeline view is displayed on the right by line, including the execution sequence and duration of each instruction. For details about the units, see Table 1.
  • Area 3: data pane, which displays statistics or instruction details. If you select Slice Detail, the details of a single instruction are displayed. If you select Slice List, the instruction list information of the selected area in the unit is displayed.
    Table 1 Unit information

    Unit

    Description

    ALL

    Instructions in this channel will be executed in all channels.

    SCALAR

    Scalar unit.

    FLOWCTRL

    Control flow instruction.

    MTE1

    Data transfer pipeline, from L1 to {L0A/L0B, UBUF}.

    CUBE

    Cube unit.

    FIXP

    Data transfer pipeline, from FixPipe L0C to OUT/L1.

    Only the exported profile data of Atlas A2 Training Series Product/Atlas 800I A2 Inference Product can be displayed.

    MTE2

    Data transfer pipeline, from {DDR/GM, L2} to {L1, L0A/B, UBUF}.

    VECTOR

    Vector unit.

    MTE3

    Data transfer pipeline, from UBUF to {DDR/GM, L2, L1}, or from L1 to {DDR/L2}.

    CACHEMISS

    Missed iCache.

    USEMASK

    Custom dotting range.

    MTE Throughput

    Memory throughput information.

    • GM_TO_L1: GM-to-L1 data transfer throughput
    • GM_TO_TOTAL: total GM output data throughput
    • GM_TO_UB: GM-to-UB data transfer throughput
    • L1_TO_GM: L1-to-GM data transfer throughput
    • TOTAL_TO_GM: total GM input data throughput
    • UB_TO_GM: UB-to-GM data transfer throughput

By examining the duration and intervals at each layer of the timeline view, you can identify potential performance bottlenecks in the corresponding instructions and pipelines. This includes determining if there are bottlenecks in instruction execution or if certain instructions are particularly time-consuming.