Model Tuning Performance Collection Tools
MindStudio provides multiple flexible system-level performance data collection methods. You can select a proper solution based on site requirements to accurately locate performance bottlenecks and improve training efficiency.
Two collection modes are available based on the enabling mode: msprof CLI and AI framework Profiler APIs (Ascend PyTorch Profiler and MindSpore Profiler).
The msprof CLI is used to collect performance data at the CANN and NPU layers. It serves as the basis for other performance data collection APIs.
The msprof CLI does not have AI framework layer data.
The Profiler APIs of the AI framework encapsulate the msprof CLI and enable further collection and parsing of performance data at the AI framework layer. This method is the most commonly used approach. According to their functions and features, the Profiler APIs can be classified into three modes: general (static) collection, dynamic collection, and online monitoring.
In addition, some training or inference suites, such as MindSpeed-MM and MindFormers, provide additional encapsulation of the Profiler APIs, allowing users to directly invoke performance data collection through the APIs in these suites.
When the AI framework Profiler is used to collect data, configure parameters by referring to Table 2.
|
Scenario |
Parameter |
|---|---|
|
General performance analysis |
|
|
NPU/GPU comparison |
This configuration is used to compare the end-to-end duration of the NPU and GPU.
|
|
Code locating |
To locate the code of an abnormal operator, you can enable the with_stack or with_modules switch in common scenarios. (Do not enable the switch unless necessary. Otherwise, the performance will deteriorate.) |
|
Analyzing the on-chip memory allocation of the operator NPU |
Set profile_memory to True. |
|
Analyzing cluster communication |
Set profiler_level to Level1. |
Based on functions and features, the Profiler APIs can be classified into three modes: general collection, dynamic collection, and online monitoring, as described in Table 3.
|
Collection Mode |
Strength |
Recommended Application Scenario |
How to Use |
|---|---|---|---|
|
General collection |
Sets the collection period or collects all data, and flushes the detailed performance data to disks. |
General performance analysis |
|
|
dynamic_profile dynamic collection |
During model training, you can start the collection process at any time and dynamically modify configuration collection items without frequently modifying the script code. |
Scenarios with high startup and shutdown costs (such as ultra-large-scale training) |