Overview

During performance tuning, the profiling tools can be used to sample and analyze key performance metrics of AI tasks at different execution stages on the Ascend AI Processor. You can efficiently locate software and hardware performance bottlenecks based on the output profile data, thereby enhancing the performance analysis efficiency of AI tasks.

This document provides multiple methods of sampling profile data. From the perspective of convenience, you are advised to sample profile data by using the msprof command line tool (CLI) in offline inference scenarios. If the Ascend-CANN-Toolkit is not installed in the environment, the msprof CLI is unavailable. In training scenarios, it is recommended that you modify AI parameters in an AI framework for profile data collection. When you use the msprof CLI or use the Ascend PyTorch Profiler APIs for data sampling, the sampled profile data can be automatically parsed and exported. Other data sampling methods require the use of the msprof CLI or msprof.py tool to parse and export the sampled data.

Figure 1 Performance analysis workflow