Tool Usage

The msProf tool can be used in two modes: msprof op and msprof op simulator. msprof op displays the memory heatmap and Roofline bottleneck analysis chart via MindStudio Insight, and msprof op simulator displays the instruction pipeline chart and operator code hot spot map also via MindStudio Insight. These modes assist in identifying exceptions in operator memory, code, and instructions, enabling comprehensive operator tuning.

  • The msProf tool depends on the msopprof executable file in the CANN package. The API usage in this file is the same as that in msprof op. This file is provided by the CANN package and does not need to be installed separately.
  • It is not allowed to initiate more than one profile data collection task on the same device.
  • Before using the msprof op and msprof op simulator, ensure that the app functions properly.

msprof op

  1. Log in to the operating environment, use msprof op to enable onboard operator tuning, and use the optional parameters and the program to be tuned (app [arguments]) to perform onboard tuning. For details about optional parameters, see Table 2. An example command is as follows:
    msprof op --output=$home/projects/output $home/projects/MyApp/out/main    // --output is optional. $home/projects/MyApp/out/main is the application in use.
  2. Use the msprof op simulator to enable operator simulation tuning, and use the optional parameters and the program to be tuned (app [arguments]) to perform the tuning.
  3. Perform operator tuning in either of the following ways:
    • Executable file-based method. The following uses add_custom_npu as an example.
      Example 1:
      msprof op ./add_custom_npu
      Example 2:
      msprof op --aic-metrics=<select_metrics> --output=./output_data ./add_custom_npu 
    • Configuration file .json based on the binary file *.o of the input operator.
      msprof op --config=./add_test.json --aic-metrics=<select_metrics> --output=./output_data
  4. After the command is executed, a folder named OPPROF_{timestamp}_XXX is generated in the default path or the specified --output directory. When all --aic-metrics are enabled, the structure is as follows:
    • Multi-device and multi-operator collection.
       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      └──OPPROF_{timestamp}_XXX
      ├── device0                  // ID of the Ascend AI Processor used during running
      └── device1                
        ├── OpName0                // OpName0 is the name of the collection operator.
         ├── 0                   // Sequence in which operators are scheduled.
          ├──dump              // Folder for storing the process files. The meaning of this parameter is the same as that in single-operator collection.
           └──xxx_yyy.csv       // xxx indicates the type of the metric generated by an operator, for example, L2Cache. For details about the metric types, see the description of the csv. file in . yyy indicates the timestamp suffix of the .csv file, for example, L2Cache_20240603022812284.csv.
          └──visualize_data.bin 
        ├── OpName1               
         ├── 0
          ├──dump 
          └──xxx_yyy.csv
          └──visualize_data.bin 
        
        ├── OpName2         
         ├── 0
          ├── dump  
          └── xxx_yyy.csv
          └──visualize_data.bin 
       
      
    • Collecting data of multiple operators on a single device
       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      15
      └──OPPROF_{timestamp}_XXX
      ├── OpName0                  // OpName0 is the name of the collection operator.
       ├── 0                     // Sequence in which operators are scheduled.
        ├── dump                // Folder for storing the process files. The meaning of this parameter is the same as that in single-operator collection.
        └──xxx_yyy.csv   //xxx indicates the type of the metric generated by an operator, for example, L2Cache. For details about the metric types, see the description of the csv. file in . yyy indicates the timestamp suffix of the .csv file, for example, L2Cache_20240603022812284.csv.
        └──visualize_data.bin 
       ├── 1
        ├──dump 
        └──xxx_yyy.csv
        └──visualize_data.bin 
      ├── OpName1         
       ├── 0
        ├── dump  
        └── xxx_yyy.csv
        └── visualize_data.bin 
      
    • Collecting data of a single-operator on a single device
       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      OPPROF_{timestamp}_XXX
      ├── dump
      ├── ArithmeticUtilization.csv
      ├── L2Cache.csv
      ├── Memory.csv
      ├── MemoryL0.csv
      ├── MemoryUB.csv
      ├── OpBasicInfo.csv
      ├── PipeUtilization.csv
      ├── ResourceConflictRatio.csv
      ├── visualize_data.bin 
      
      Table 1 msprof op files

      File

      Description

      dump folder

      Raw profile data, which can be ignored.

      ArithmeticUtilization.csv

      Time consumptions and ratios of Cube and Vector instructions. For details, see ArithmeticUtilization (Time Consumptions and Percentages of Cube and Vector Instructions).

      L2Cache.csv

      L2 cache hit ratio. For details, see L2Cache (L2 Cache Hit Ratio).

      Memory.csv

      UB/L1/L2/main memory read/write bandwidth rate. For details, see Memory (Memory Read/Write Bandwidth Rate).

      MemoryL0.csv

      L0A/L0B/L0C memory read/write bandwidth rate. For details, see MemoryL0 (L0 Read/Write Bandwidth Rate).

      MemoryUB.csv

      MTE/Vector/Scalar UB read/write bandwidth rate. For details, see MemoryUB (UB Read/Write Bandwidth Rate).

      PipeUtilization.csv

      Time consumptions and ratios of compute units and MTE units. For details, see PipeUtilization (Percentages of Time Taken by Compute Units and MTEs).

      ResourceConflictRatio.csv

      For details about the percentage of bank groups, bank conflicts, and resource conflicts on the UB in all instructions, see ResourceConflictRatio (Resource Conflict Ratio).

      OpBasicInfo.csv

      Basic operator information, including the operator names, block dim, and time consumptions. For details, see OpBasicInfo (Basic Operator Information).

      visualize_data.bin

      File that displays basic operator information, compute unit load, and Roofline bottleneck analysis. For details, see Computing Memory Heatmap and Roofline Bottleneck Analysis Chart.

      NOTE:
  5. After the visualize_data.bin file is imported into MindStudio Insight, Computing Memory Heatmap, and Roofline Bottleneck Analysis Chart are displayed.

msprof op simulator

The operator tuning tool supports profile data collection and automatic parsing in a simulation environment.

  • The simulation function of the msProf tool must run on card 0. If the visible card number is changed, the simulation fails.
  • For performance of some operators, call TRACE_START and TRACE_STOP in the single core of the . Add -DASCENDC_TRACE_ON to the compilation configuration file. For details, see adding -DASCENDC_TRACE_ON. Then, the system can generate the pipeline chart. For details about the flow chart content, see Instruction Pipeline Chart.
  • You need to add -DASCENDC_TRACE_ON to the compilation configuration file. For details, see the following sample project.
    For AddKernelInvocationNeo operator project, add the following code to the ${git_clone_path}/samples/operator/ascendc/0_introduction/3_add_kernellaunch/AddKernelInvocationNeo/cmake/npu_lib.cmake file:
    1
    2
    3
    4
    5
    ascendc_compile_definitions
    (
        ...
        -DASCENDC_TRACE_ON
    )
    
  1. Log in to the operating environment, utilize the msprof op simulator to enable operator simulation-based tuning, and then use the optional simulation parameters and the application to be optimized (app [arguments]) for tuning. For details about the optional simulation parameters, see Table 3. An example command is as follows:
    msprof op simulator --output=$home/projects/output $home/projects/MyApp/out/main //  --output is optional. $home/projects/MyApp/out/main is the application in use.
  2. You can use either of the following methods for operator simulation-based tuning:
    • Executable file-based method. The following uses add_custom_npu as an example.
      msprof op simulator --output=./output_data ./add_custom_npu 
    • Method based on the JSON configuration file of the input operator binary file *.o
      msprof op simulator --config=./add_test.json --output=./output_data
  3. After the command is executed, a folder named OPPROF_{timestamp}_XXX is generated in the specified --output directory. An example of the folder structure is as follows:
    • Collecting data of a single-operator
       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      OPPROF_{timestamp}_XXX
      ├── dump
      └── simulator
          ├── core0.veccore0       // Store the data file of each core in the core*.veccore* or core*.cubecore* directory.
             ├── core0.veccore0_code_exe.csv
             ├── core0.veccore0_instr_exe.csv
             └── trace.json     // Simulation instruction pipeline chart file of the core.
          ├── core0.veccore1
             ├── core0.veccore1_code_exe.csv
             ├── core0.veccore1_instr_exe.csv
             └── trace.json
          ├── core1.veccore0
             ├── core1.veccore0_code_exe.csv
             ├── core1.veccore0_instr_exe.csv
             └── trace.json
          ├── ... 
          ├── visualize_data.bin 
          └── trace.json      // Simulation instruction pipeline chart files of all cores.
      
    • Collecting data of multiple operators
       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      └──OPPROF_{timestamp}_XXX
      ├── OpName1           // OpName1 is the name of the operator to be collected.
       ├── 0              // Sequence in which operators are scheduled.
        ├── dump        // Folder for storing the process files. The meaning of this parameter is the same as that in single-operator collection.
        ├── dump        // The content is the same as that in the single-operator simulator folder, but the .csv files in the simulator folder have timestamp suffixes added, for example, core*_code_exe_20240429111143146.csv.
       ├── 1
        ├── dump        
        └──simulator
       ├── dump          // Folder that stores the process files.
      ├── OpName2         
       ├── 0
        ├── dump       
        └── simulator
       ├── dump  
      
    Table 2 msprof op simulator files

    File

    Description

    dump folder

    Folder for storing the dump data generated by the original simulation.

    simulator folder

    NOTE:

    Folder for storing dump data file analysis results.

    core*_code_exe.csv

    Time consumed by code lines. The asterisk (*) indicates cores 0 to n, which helps users promptly identify the most time-consuming part of the written code. For details, see Code Line Time Consumption Data File.

    core*_instr_exe.csv

    Detailed information about code instructions. The asterisk (*) indicates cores 0 to n, which helps users promptly identify the most time-consuming instructions. For details, see Code Instruction Information File.

    trace.json

    Simulation instruction pipeline chart file, including the subfile of each core and the summary file of all cores. For details, see Instruction Pipeline Chart.

    visualize_data.bin

    Visualization file of simulation pipelines and simulation hotspot functions. For details, see Instruction Pipeline Chart and Operator Code Hot Spot Map.

    NOTE:

    The generated visualize_data.bin file that presents information about simulation pipeline charts and simulation hotspot functions can be displayed in MindStudio Insight. For details, see MindStudio Insight User Guide.

  4. After the trace.json file is imported to the Chrome browser or MindStudio Insight, Instruction Pipeline Chart is displayed.
  5. Optional: After the visualize_data.bin file is imported to MindStudio Insight, Instruction Pipeline Chart and Operator Code Hot Spot Map are displayed.