Performance Tuning
Prerequisites
Before using the performance tuning tool, read about the restrictions in "Before You Start" in the Profiling Instructions.
Profile Data Collection
The msprof command line tool provides the capabilities of collecting and parsing the AI job runtime profile data, system data of Ascend AI Processors, and other required data.
- Log in to the environment where the Ascend-CANN-Toolkit is located and navigate to the ${install_path}/cann/tools/profiler/bin. ${install_path} indicates the installation path of the CANN Toolkit and ops operator package.
- Run the following command to collect profile data. The following describes how to collect profile data of a floating-point model.
msprof --output=${output_dir} bash ${ATB_SPEED_HOME_PATH}/examples/models/llama3/run_pa.sh --model_path ${model_path} ${max_output_length}--output indicates the path for storing the collected data. max_output_length indicates the maximum number of output tokens in the dialog test.
- If the command output contains the following information, the collection is complete:
1 2 3 4 5 6 7 8 9 10 11 12
[INFO] Start export data in PROF_000001_20241118061102981_MORBFBJDEPNJEQPA. [INFO] Export all data in PROF_000001_20241118061102981_MORBFBJDEPNJEQPA done. [INFO] Start query data in PROF_000001_20241118061102981_MORBFBJDEPNJEQPA. Job Info Device ID Dir Name Collection Time Model ID Iteration Number Top Time Iteration Rank ID NA host 2024-11-18 06:11:02.985433 N/A N/A N/A 1 NA 1 device_1 2024-11-18 06:11:07.222675 N/A N/A N/A 1 [INFO] Query all data in PROF_000001_20241118061102981_MORBFBJDEPNJEQPA done. [INFO] Profiling finished. [INFO] Process profiling data complete. Data is saved in {output_dir}/PROF_000001_20241118061102981_MORBFBJDEPNJEQPA
- After the collection is complete, the PROF_000001_20241118061102981_MORBFBJDEPNJEQPA directory is generated in the directory specified by --output to store the collected profile data.The mindstudio_profiler_output directory under the PROF_000001_20241118061102981_MORBFBJDEPNJEQPA directory stores the parsed profile data. The file structure is as follows:
├── host # Save the original data. You can ignore this step. │ └── data ├── device_{id} # Save the original data. You can ignore this step. │ └── data ├── mindstudio_profiler_log # Collect logs. │ └── log └── mindstudio_profiler_output ├── msprof_20241118061314.json # Timeline report ├── op_summary_20241118061317.csv # AI Core and AI CPU operator data ├── task_time_20241118061317.csv # Task scheduling information of Task Scheduler ├── op_statistic_20241118061317.csv # Number of times that the AI Core and AI CPU operators are called and the time consumption ├── api_statistic_20241118061317.csv # API execution time statistics at the CANN layer └── README.txt
Profile Data Analysis
You can use MindStudio Insight to visualize the collected profile data for intuitive analysis of profile bottlenecks.
- Open MindStudio Insight.
- Copy the profile data collected in 4 to your Windows system.
- Click Import Data in the upper left corner of the MindStudio Insight page. In the displayed dialog box, select the profile data file or directory and click Confirm, as shown in Figure 1.
- The profile data is presented in a visual format on MindStudio Insight, as shown in Figure 2.
- Analyze the profile data.
Parent topic: Model Inference

