Roofline Bottleneck Analysis Chart

The visualize_data.bin file generated by msprof op can be visualized using MindStudio Insight. A Roofline bottleneck analysis chart can be used to build a processor performance model, which can be used to quickly evaluate the theoretical performance limit of an operator, allowing developers to quickly identify bottlenecks.

  • To use MindStudio Insight, you need to install the MindStudio Insight software package separately. For details about the download link, see"Installation and Uninstallation".
  • For details about how to import the visualize_data.bin file to MindStudio Insight, see Importing Profile Data.
  • For details about how to use MindStudio Insight, see .

Supported Hardware

The visualize_data.bin file generated by msprof op can be imported to MindStudio Insight to display. Roofline analysis charts vary depending on hardware and operator types.

  • For the Atlas inference products , the Roofline bottleneck analysis chart contains only the memory unit view.
    Figure 1 Roofline bottleneck analysis chart for Atlas inference products
  • For the Atlas A3 training products / Atlas A3 inference products and Atlas A2 training products / Atlas A2 inference products , the view generated varies according to the operator type. For details, see Table 1.
    Figure 2 Roofline bottleneck analysis chart for Atlas A3 training products / Atlas A3 inference products and Atlas A2 training products / Atlas A2 inference products

    Table 1 Roofline views supported by Atlas A3 training products / Atlas A3 inference products and Atlas A2 training products / Atlas A2 inference products

    Roofline View Type

    Vector Operator

    Cube Operator

    Mix Operator

    GM/L2 view

    Vector memory unit

    -

    Vector memory channel

    -

    Vector Pipeline

    -

    Cube memory unit

    -

    Cube memory channel

    -

    Cube Pipeline

    -

Function Description

The Roofline performance analysis for each unit or channel consists of the x-axis, y-axis, roofline, bandwidth diagonal, and actual execution point. For details, see Figure 3.

Figure 3 Roofline analysis
  • X-axis: represents the arithmetic intensity (Ops/Byte), that is, the ratio of the total number of floating-point operations to the amount of data accessed from memory in a unit or channel.
  • Y-axis: represents the computing performance (TOPS/s), that is, the number of floating-point operations that can be executed per second.
  • Roofline: the horizontal line on the top of the chart, representing the theoretical maximum NPU computing performance. Regardless of how much the arithmetic intensity is improved, the actual application performance cannot exceed the peak performance of hardware.
  • Bandwidth line: the diagonal line that intersects the roofline. The y-coordinate of the intersection depends on the theoretical maximum bandwidth. If the theoretical maximum bandwidth multiplied by the arithmetic intensity is less than the theoretical maximum NPU computing performance, the achievable performance increases linearly with the arithmetic intensity.

    The theoretical maximum performance that can be achieved by an operator is determined by the minimum value between the theoretical maximum NPU computing performance and the theoretical maximum bandwidth multiplied by the actual arithmetic intensity. It can be obtained through the roofline and bandwidth line.

  • For details about the parameters of the actual operator execution point, see Table 2.
    Table 2 Actual operator execution point

    Coordinate Parameter

    Description

    Bandwidth

    Theoretical maximum bandwidth of the unit or channel.

    Arithmetic Intensity

    Arithmetic intensity of the operator during actual execution, corresponding to the value on the x-axis.

    Performance

    Computing performance of the operator during actual execution, corresponding to the value on the y-axis.

    Performance Ratio

    Ratio of the computing performance of the operator during actual execution to the theoretical maximum computing performance for the current data size, that is, the percentage of a/b in the figure.

The Roofline analysis chart analyzes the performance percentage of operators and provides the following objective analysis results:
  • If the operator performance percentage is greater than 80%, a message is displayed based on the region.
    • Compute Bound: computing bottleneck.
    • Memory Bound: memory bottleneck.
  • If the operator performance percentage is less than 80% and the bound type is latency bound:
    • If the maximum pipeline ratio is less than 80%, the message "latency bound:pipeline caused" is displayed.
    • If the maximum pipeline ratio is greater than 80%, identify the type of the maximum pipeline ratio.
      • If the type of the maximum pipeline ratio is compute pipeline (cube ratio, vector ratio, or scalar ratio), the message "latency bound:compute caused" is displayed.
      • If the type of the maximum pipeline ratio is memory pipeline (MTE1 ratio, MTE2 ratio, or MTE3 ratio), the message "latency bound:memory caused" is displayed.