Instruction Pipeline Chart

It displays timing relationship by instruction and associates with the call stack to quickly trace bottlenecks. The following two visualization modes are supported:

  • If the -g compilation option is added, the generated binary file contains debugging information. You are advised to restrict the access permission of user programs with debugging information to ensure that only authorized personnel can access the binary file.
  • If the functions provided by the llvm-symbolizer component are not used, do not include -g when compiling the program that is input to msProf. In this case, the msProf tool does not call the functions of the llvm-symbolizer component.
  • For performance of some operators, call TRACE_START and TRACE_STOP in the single core of the . Add -DASCENDC_TRACE_ON to the compilation configuration file. For details, see adding -DASCENDC_TRACE_ON. Then, the system can generate the pipeline chart. For details about the flow chart content, see Instruction Pipeline Chart.
  • You need to add -DASCENDC_TRACE_ON to the compilation configuration file. For details, see the following sample project.
    For AddKernelInvocationNeo operator project, add the following code to the ${git_clone_path}/samples/operator/ascendc/0_introduction/3_add_kernellaunch/AddKernelInvocationNeo/cmake/npu_lib.cmake file:
    1
    2
    3
    4
    5
    ascendc_compile_definitions
    (
        ...
        -DASCENDC_TRACE_ON
    )
    
  • Google Chrome

    Enter the chrome://tracing address in the address box of Google Chrome, drag the instruction pipeline file (trace.json) generated in Tool Usage to the blank area, and press the shortcut keys on the keyboard (W: zoom in; S: zoom out; A: move left; D: move right) to view the file. See Table 1 for more details.

    Table 1 Key fields

    Field

    Description

    VECTOR

    Vector unit.

    SCALAR

    Scalar unit.

    CUBE

    Cube unit.

    MTE1

    Data transfer flow. The transfer direction is L1 -> {L0A/L0B, UBUF}.

    MTE2

    Data transfer flow. The transfer direction is {DDR/GM, L2} -> {L1, L0A/B, UBUF}.

    MTE3

    Data transfer pipeline, from UBUF to {DDR/GM, L2, L1}, or from L1 to {DDR/L2}.

    FLOWCTRL

    Control flow instruction.

    CACHEMISS

    ICache that is not hit.

    USEMASK

    Custom dotting range.

    ALL

    Instructions in this channel will be executed in all channels.

  • MindStudio Insight
    The trace.json or visualize_data.bin file generated by msprof op simulator can be imported to MindStudio Insight to display.
    • To use MindStudio Insight, you need to install the MindStudio Insight software package. For details about the download link, see Installation and Uninstallation.
    • For details about how to import the visualize_data.bin file to MindStudio Insight, see Importing Profile Data.
    • For details about how to set MindStudio Insight, see Timeline.

Instruction Pipeline Chart (MindStudio Insight As an Example)

The MindStudio Insight tool provides the running status of instructions on the Ascend AI Processor in a sequence diagram. Users can identify the sequence optimization points of micro instructions by analyzing the instruction details, instruction execution time, call stack of the code associated with the instruction, and synchronization lines between instructions and pipelines in the sequence diagram.

Figure 1 Timeline page
  • Shows the execution time of each instruction within each pipeline and the dependencies between instructions across different pipelines, helping you to identify potential performance optimization points of pipelines.
  • Associates pipeline instruction information with code to guide you through optimizing pipeline layout based on code.

By examining the time consumption and intervals at each layer of the timeline, you can identify potential performance bottlenecks in the corresponding instructions and pipelines. This includes determining if there are bottlenecks in instruction execution or if certain instructions are particularly time-consuming.