Performance Tool Overview

This section describes how to efficiently use the tuning toolchain in training and inference tasks to implement a closed-loop process from performance data collection to fault locating. The training scenario focuses on model tuning, and the inference scenario includes model tuning and service tuning. This section focuses on model tuning and service tuning.

Table 1 Introduction to performance tools

Tuning Dimension

Procedure

Tool

Description

Model tuning

Performance data collection

Two collection modes are available based on the enabling mode: msprof CLI and AI framework Profiler APIs. For details, see Model Tuning Performance Collection Tools.

  • Collection using AI framework Profiler APIs
    • PyTorch: Ascend PyTorch Profiler
    • MindSpore: MindSpore Profiler
  • Collection using msprof CLI
NOTE:

The msprof CLI does not have AI framework layer data.

To record the performance data required for model running, including the AI framework and Ascend software and hardware, you need to select an appropriate performance data collection tool. For details, see Model Tuning Performance Collection Tools.

The msprof CLI is used to collect performance data at the CANN and NPU layers. It serves as the basis for other performance data collection APIs.

The Profiler APIs of the AI framework encapsulate the msprof CLI and enable further collection and parsing of performance data at the AI framework layer. This method is the most commonly used approach in training and online inference scenarios. According to their functions and features, the Profiler APIs can be classified into three modes: general (static) collection, dynamic collection, and online monitoring.

In addition, some training or inference suites, such as MindSpeed-MM and MindFormers, provide additional encapsulation of the Profiler APIs, allowing users to directly invoke performance data collection through the APIs in these suites.

Performance data analysis

Quick analysis tool for model tuning:
  • Cluster analysis: cluster_analyze
  • Expert advice: Advisor
  • Performance comparison: compare

For details, see Quick Analysis for Model Tuning (msprof-analyze CLI).

The msprof-analyze provides the following functions for preliminary analysis:

  • cluster_analyze: Extracts iteration duration and communication data to quickly identify slow cards, nodes, and links in large-scale clusters (such as those with 1,000 or 10,000 cards), where analyzing all data directly is impractical, You are advised to use this tool together with the Summary and Communication tab pages of MindStudio Insight. For details, see Cluster Performance Analysis in MindStudio Insight.
  • Advisor: Identifies common problems and provides tuning suggestions, quickly demarcates and locates typical performance problems, and provides guidance for further analysis.
  • compare: Compares and analyzes training duration and memory usage to identify degraded operators or APIs, helping users improve performance tuning efficiency. It also allows comparison of performance differences between GPUs and NPUs, as well as between NPUs. It is advised to use this feature in scenarios where baseline data is available, such as performance degradation after GPU-to-NPU migration or when experiencing performance jitter.

In-depth analysis tool for model tuning. For details, see In-depth Analysis for Model Tuning (MindStudio Insight).

The MindStudio Insight tool displays complete profile data in graphics, helping users deeply understand and accurately locate root causes. This tool uses the top-down analysis method, that is, from macro to micro, from the entire cluster to a single node. For details about the usage policies and operations, see In-depth Analysis for Model Tuning (MindStudio Insight).

Service tuning

NOTE:

Service tuning is involved only in the inference scenario. For details about how to use this tool, see Serving Tools.

Environment pre-check

Precheck tool (msprechecker)

Check whether the overall service performance is affected by system, environment variable, or configuration file issues.

Quick analysis

  • Expert advice tool for serving tuning (msservice_advisor)
  • Automatic tuning tool (modelevalstate)
  • msservice_advisor is applicable to scenarios where serving performance needs to be quickly improved, but does not support fine-grained tuning.
  • modelevalstate is used to improve serving performance and can achieve 95% of the optimal performance of manual tuning.

In-depth analysis

Serving tuning tool (msServiceProfiler)

This tool is used for in-depth analysis and is suitable for users with extensive experience in serving operations.