Model Tuning Performance Collection Tools

MindStudio provides multiple flexible system-level performance data collection methods. You can select a proper solution based on site requirements to accurately locate performance bottlenecks and improve training efficiency.

Two collection modes are available based on the enabling mode: msprof CLI and AI framework Profiler APIs (Ascend PyTorch Profiler and MindSpore Profiler).

The msprof CLI is used to collect performance data at the CANN and NPU layers. It serves as the basis for other performance data collection APIs.

The msprof CLI does not have AI framework layer data.

The Profiler APIs of the AI framework encapsulate the msprof CLI and enable further collection and parsing of performance data at the AI framework layer. This method is the most commonly used approach. According to their functions and features, the Profiler APIs can be classified into three modes: general (static) collection, dynamic collection, and online monitoring.

In addition, some training or inference suites, such as MindSpeed-MM and MindFormers, provide additional encapsulation of the Profiler APIs, allowing users to directly invoke performance data collection through the APIs in these suites.

Figure 1 Performance collection framework
Table 1 Collection mode description

Collection Mode

Strength

Recommended Application Scenario

Reference Document Link

Collection using msprof CLI

The msprof CLI tool provides the capabilities of collecting and parsing the AI job runtime profile data, system data of Ascend AI Processors, and other required data.

NOTE:

The msprof CLI does not have AI framework layer data.

Training and inference scenarios.

Section "msprof Common Collection Commands" in Profiling Instructions.

Ascend PyTorch Profiler APIs

Fully align with the usage in PyTorch-GPU scenarios and support collection of PyTorch framework and Ascend software and hardware data.

General performance analysis based on PyTorch.

Section "Ascend PyTorch Profiler" in Profiling Instructions.

MindSpore Profiler APIs

Collect MindSpore framework and Ascend software and hardware data.

General performance analysis based on MindSpore.

Section MindSpore Profiler in Profiling Instructions.

When the AI framework Profiler is used to collect data, configure parameters by referring to Table 2.

Table 2 Parameter settings

Scenario

Parameter

General performance analysis

  • Set profiler_level to Level1.
  • aic_metrics: Use the default value PipeUtilization.
  • activities: Collect CPU and NPU data.
  • Other switches are enabled as required.

NPU/GPU comparison

This configuration is used to compare the end-to-end duration of the NPU and GPU.

  • Set profiler_level to Level0.
  • activities: Collect only NPU data or CPU and NPU data (as required).
  • Other switches are disabled.

Code locating

To locate the code of an abnormal operator, you can enable the with_stack or with_modules switch in common scenarios. (Do not enable the switch unless necessary. Otherwise, the performance will deteriorate.)

Analyzing the on-chip memory allocation of the operator NPU

Set profile_memory to True.

Analyzing cluster communication

Set profiler_level to Level1.

Based on functions and features, the Profiler APIs can be classified into three modes: general collection, dynamic collection, and online monitoring, as described in Table 3.

Table 3 Collection types

Collection Mode

Strength

Recommended Application Scenario

How to Use

General collection

Sets the collection period or collects all data, and flushes the detailed performance data to disks.

General performance analysis

dynamic_profile dynamic collection

During model training, you can start the collection process at any time and dynamically modify configuration collection items without frequently modifying the script code.

Scenarios with high startup and shutdown costs (such as ultra-large-scale training)