Roofline Bottleneck Analysis Chart
The visualize_data.bin file generated by msprof op can be visualized using MindStudio Insight. A Roofline bottleneck analysis chart can be used to build a processor performance model, which can be used to quickly evaluate the theoretical performance limit of an operator, allowing developers to quickly identify bottlenecks.
- To use MindStudio Insight, you need to install the MindStudio Insight software package separately. For details about the download link, see"Installation and Uninstallation".
- For details about how to import the visualize_data.bin file to MindStudio Insight, see Importing Profile Data.
- For details about how to use MindStudio Insight, see .
Supported Hardware
The visualize_data.bin file generated by msprof op can be imported to MindStudio Insight to display. Roofline analysis charts vary depending on hardware and operator types.
- For the
Atlas inference products , the Roofline bottleneck analysis chart contains only the memory unit view.Figure 1 Roofline bottleneck analysis chart forAtlas inference products
- For the
Atlas A3 training products /Atlas A3 inference products andAtlas A2 training products /Atlas A2 inference products , the view generated varies according to the operator type. For details, see Table 1.Figure 2 Roofline bottleneck analysis chart forAtlas A3 training products /Atlas A3 inference products andAtlas A2 training products /Atlas A2 inference products 
Table 1 Roofline views supported by Atlas A3 training products /Atlas A3 inference products andAtlas A2 training products /Atlas A2 inference products Roofline View Type
Vector Operator
Cube Operator
Mix Operator
GM/L2 view
√
√
√
Vector memory unit
√
-
√
Vector memory channel
√
-
√
Vector Pipeline
√
-
√
Cube memory unit
-
√
√
Cube memory channel
-
√
√
Cube Pipeline
-
√
√
Function Description
The Roofline performance analysis for each unit or channel consists of the x-axis, y-axis, roofline, bandwidth diagonal, and actual execution point. For details, see Figure 3.
- X-axis: represents the arithmetic intensity (Ops/Byte), that is, the ratio of the total number of floating-point operations to the amount of data accessed from memory in a unit or channel.
- Y-axis: represents the computing performance (TOPS/s), that is, the number of floating-point operations that can be executed per second.
- Roofline: the horizontal line on the top of the chart, representing the theoretical maximum NPU computing performance. Regardless of how much the arithmetic intensity is improved, the actual application performance cannot exceed the peak performance of hardware.
- Bandwidth line: the diagonal line that intersects the roofline. The y-coordinate of the intersection depends on the theoretical maximum bandwidth. If the theoretical maximum bandwidth multiplied by the arithmetic intensity is less than the theoretical maximum NPU computing performance, the achievable performance increases linearly with the arithmetic intensity.
The theoretical maximum performance that can be achieved by an operator is determined by the minimum value between the theoretical maximum NPU computing performance and the theoretical maximum bandwidth multiplied by the actual arithmetic intensity. It can be obtained through the roofline and bandwidth line.
- For details about the parameters of the actual operator execution point, see Table 2.
Table 2 Actual operator execution point Coordinate Parameter
Description
Bandwidth
Theoretical maximum bandwidth of the unit or channel.
Arithmetic Intensity
Arithmetic intensity of the operator during actual execution, corresponding to the value on the x-axis.
Performance
Computing performance of the operator during actual execution, corresponding to the value on the y-axis.
Performance Ratio
Ratio of the computing performance of the operator during actual execution to the theoretical maximum computing performance for the current data size, that is, the percentage of a/b in the figure.
- If the operator performance percentage is greater than 80%, a message is displayed based on the region.
- Compute Bound: computing bottleneck.
- Memory Bound: memory bottleneck.
- If the operator performance percentage is less than 80% and the bound type is latency bound:
- If the maximum pipeline ratio is less than 80%, the message "latency bound:pipeline caused" is displayed.
- If the maximum pipeline ratio is greater than 80%, identify the type of the maximum pipeline ratio.
- If the type of the maximum pipeline ratio is compute pipeline (cube ratio, vector ratio, or scalar ratio), the message "latency bound:compute caused" is displayed.
- If the type of the maximum pipeline ratio is memory pipeline (MTE1 ratio, MTE2 ratio, or MTE3 ratio), the message "latency bound:memory caused" is displayed.
