Model Tuning Performance Collection Tools

MindStudio provides multiple flexible system-level performance data collection methods. You can select a proper solution based on site requirements to accurately locate performance bottlenecks and improve training efficiency.

Two collection modes are available based on the enabling mode: msprof CLI and AI framework Profiler APIs (Ascend PyTorch Profiler and MindSpore Profiler).

The msprof CLI is used to collect performance data at the CANN and NPU layers. It serves as the basis for other performance data collection APIs.

The msprof CLI does not have AI framework layer data.

The Profiler APIs of the AI framework encapsulate the msprof CLI and enable further collection and parsing of performance data at the AI framework layer. This method is the most commonly used approach. According to their functions and features, the Profiler APIs can be classified into three modes: general (static) collection, dynamic collection, and online monitoring.

In addition, some training or inference suites, such as MindSpeed-MM and MindFormers, provide additional encapsulation of the Profiler APIs, allowing users to directly invoke performance data collection through the APIs in these suites.

Figure 1 Performance collection framework

**Table 1** Collection mode description
Collection Mode	Strength	Recommended Application Scenario	Reference Document Link
Collection using msprof CLI	The msprof CLI tool provides the capabilities of collecting and parsing the AI job runtime profile data, system data of Ascend AI Processors, and other required data. NOTE: The msprof CLI does not have AI framework layer data.	Training and inference scenarios.	Section "msprof Common Collection Commands" in Profiling Instructions.
Ascend PyTorch Profiler APIs	Fully align with the usage in PyTorch-GPU scenarios and support collection of PyTorch framework and Ascend software and hardware data.	General performance analysis based on PyTorch.	Section "Ascend PyTorch Profiler" in Profiling Instructions.
MindSpore Profiler APIs	Collect MindSpore framework and Ascend software and hardware data.	General performance analysis based on MindSpore.	Section MindSpore Profiler in Profiling Instructions.

When the AI framework Profiler is used to collect data, configure parameters by referring to Table 2.

**Table 2** Parameter settings
Scenario	Parameter
General performance analysis	Set profiler_level to Level1. aic_metrics: Use the default value PipeUtilization. activities: Collect CPU and NPU data. Other switches are enabled as required.
NPU/GPU comparison	This configuration is used to compare the end-to-end duration of the NPU and GPU. Set profiler_level to Level0. activities: Collect only NPU data or CPU and NPU data (as required). Other switches are disabled.
Code locating	To locate the code of an abnormal operator, you can enable the with_stack or with_modules switch in common scenarios. (Do not enable the switch unless necessary. Otherwise, the performance will deteriorate.)
Analyzing the on-chip memory allocation of the operator NPU	Set profile_memory to True.
Analyzing cluster communication	Set profiler_level to Level1.

Based on functions and features, the Profiler APIs can be classified into three modes: general collection, dynamic collection, and online monitoring, as described in Table 3.

**Table 3** Collection types
Collection Mode	Strength	Recommended Application Scenario	How to Use
General collection	Sets the collection period or collects all data, and flushes the detailed performance data to disks.	General performance analysis	Ascend PyTorch Profiler: For details, see section "Ascend PyTorch Profiler" in Profiling Instructions. MindSpore Profiler: For details, see section MindSpore Profiler in Profiling Instructions. msprof: For details, see section "msprof Common Collection Commands" in Profiling Instructions.
dynamic_profile dynamic collection	During model training, you can start the collection process at any time and dynamically modify configuration collection items without frequently modifying the script code.	Scenarios with high startup and shutdown costs (such as ultra-large-scale training)

Parent topic: Model Tuning Tools