Performance Tuning

Profiling and Analysis on the NPU

The operator program compiled based on the calling APIs of NPU operators is compiled using the BiSheng Compiler to generate an executable program. The operator tuning tool is used to run the executable file generated in NPU mode to collect the performance data of the Ascend C operator executed on the AI processor, facilitating refined performance tuning.

  • Profiling performance data collection: Use the msProf tool to collect the performance data of the Ascend C operator executed on the AI processor.
  • Roofline bottleneck analysis: The visualize_data.bin file generated by msprof op can be visualized using MindStudio Insight. A Roofline bottleneck analysis chart can be used to build a processor performance model, which can be used to quickly evaluate the theoretical performance limit of an operator, allowing you to quickly identify bottlenecks.
  • Instruction pipeline chart: The visualize_data.bin or trace.json file generated by msprof op simulator can be used for visualized display. An instruction pipeline chart displays timing relationship by instruction and associates with the call stack to quickly locate bottlenecks.

For details about how to use the performance tuning tool, see msProf (Operator Tuning).