Performance Tuning

The Profiling tool is designed to collect and analyze key performance metrics of AI jobs at different stages during their execution on Ascend AI Processors. You can efficiently locate software and hardware performance bottlenecks based on the output profile data, thereby enhancing the overall efficiency of AI job performance analysis.

Prerequisite

Before using the performance tuning tool, read the Restrictions section.

Profile Data Collection

The msprof command line tool provides the capabilities of collecting and parsing the AI job runtime profile data, system data of Ascend AI Processors, and other required data.

  1. Log in to the environment where the Ascend-CANN-Toolkit is located and go to CANN software installation directory /ascend-toolkit/latest/toolkit/tools/profiler/bin.
  2. Run the following command to collect profile data. The following describes how to collect profile data of a floating-point model.
    msprof --output=${output_dir} bash ${ATB_SPEED_HOME_PATH}/examples/models/llama3/run_pa.sh --model_path ${model_path} ${max_output_length}

    --output indicates the path for storing the collected profile data. max_output_length indicates the maximum number of output tokens in the dialog test.

  3. If the command output contains the following information, the collection is complete:
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    [INFO] Start export data in PROF_000001_20241118061102981_MORBFBJDEPNJEQPA.
    [INFO] Export all data in PROF_000001_20241118061102981_MORBFBJDEPNJEQPA done.
    [INFO] Start query data in PROF_000001_20241118061102981_MORBFBJDEPNJEQPA.
    Job Info	Device ID	Dir Name	Collection Time           	Model ID	Iteration Number	Top Time Iteration	Rank ID	
    
    NA      		        host    	2024-11-18 06:11:02.985433	N/A     	N/A             	N/A               	1      	
    
    NA      	1        	device_1	2024-11-18 06:11:07.222675	N/A     	N/A             	N/A               	1 
    
    [INFO] Query all data in PROF_000001_20241118061102981_MORBFBJDEPNJEQPA done.   
    [INFO] Profiling finished.
    [INFO] Process profiling data complete. Data is saved in {output_dir}/PROF_000001_20241118061102981_MORBFBJDEPNJEQPA
    
  4. After the collection is complete, the PROF_000001_20241118061102981_MORBFBJDEPNJEQPA directory is generated in the directory specified by --output to store the collected profile data.
    The mindstudio_profiler_output directory under the PROF_000001_20241118061102981_MORBFBJDEPNJEQPA directory stores the parsed profile data. The file structure is as follows:
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    ├── host   // Save the original data. You can ignore this step.
        └── data
    ├── device_{id}   // Save the original data. You can ignore this step.
        └── data
    ├── mindstudio_profiler_log   // Collect logs.
        └── log
    └── mindstudio_profiler_output
          ├── msprof_20241118061314.json   // Timeline data table
          ├── op_summary_20241118061317.csv   // AI Core and AI CPU operator data
          ├── task_time_20241118061317.csv   // Task Scheduler task scheduling information
          ├── op_statistic_20241118061317.csv   // Call times and duration of the AI Core and AI CPU operators
          ├── api_statistic_20241118061317.csv   // API execution duration statistics at the CANN layer
          └── README.txt