该功能执行分析后通过Workload Analysis（比较工作点和屋顶的相对位置）输出分析结果。输出结果包括：

Op list信息（列出所有工作在此区域的算子信息）：
- 算子名
- 算子AI Core的时间占总AI Core时间的百分比（越大越有优化价值）
- 主要出现瓶颈的通道
- 距离当前的屋顶的百分比（比值越大表示越接近硬件上限瓶颈）
专家系统优化建议

输出结果如下：

图1 Roofline模型的算子信息列表及优化建议
点击放大

输出结果是将存在瓶颈算子的基本信息以列表形式输出，并提供优化建议，优化建议内容如下：

Memory Bound

内存瓶颈。

Change the data access path to one with higher bandwidth
更改数据访问通路，使用带宽更大的数据访问通路。
Reduce the amount of repeated data migration and increase FLOPS/BYTES
减少数据重复搬移量，增大FLOPS/BYTES。

Compute Bound

计算瓶颈。

Change calculation units, for example, replace Vector with Cube
更改计算单元，例如使用Cube替换Vector。
Adopt low-precision computing
使用低精度计算。
Use dual-core
使用双核计算。
Optimize the algorithms to reduce the computation amount
优化算法，减少计算量。

Low Pipeline

低流水利用率。

Use the double buffer
使用乒乓策略。
Reduce strong data dependencies between pipelines
优化不合理的流水依赖。
Eliminating improper instruction synchronization between pipelines
消除流水间不合理的指令同步。
Delete redundant pipe_barrier(PIPE_ALL).
删除冗余pipe_barrier（PIPE_ALL）指令。

Latency Compute Bound

潜在计算瓶颈。

Increase the number of repeats computed by Vector instructions
增大Vector指令计算的repeat数目。
Check whether the mask setting is proper
检查mask设置是否合理。
Check bank conflict
检查bank conflict。
Use high-performance instructions to replace low-performance instructions
使用高性能指令替换低性能指令。
Reduce the use of long-running instructions
减少使用运行时间长的指令。

Latency Memory Bound

潜在内存瓶颈。

Check whether data migration granularity/burst length/burst number are too small
检查数据搬运粒度是否过小。
Reduce unreasonable blocks inside the pipeline
减少流水内部不合理的阻塞。
Avoid read/write resource preemption
避免读写资源抢占。

图2 Roofline模型性能分析概要
点击放大

Model Bound Coefficient

模型瓶颈系数。

基于Roofline模型的算子瓶颈识别与优化建议