Tool Usage
The msProf tool can be used in two modes: msprof op and msprof op simulator. msprof op displays the memory heatmap and Roofline bottleneck analysis chart via MindStudio Insight, and msprof op simulator displays the instruction pipeline chart and operator code hot spot map also via MindStudio Insight. These modes assist in identifying exceptions in operator memory, code, and instructions, enabling comprehensive operator tuning.
- The msProf tool depends on the msopprof executable file in the CANN package. The API usage in this file is the same as that in msprof op. This file is provided by the CANN package and does not need to be installed separately.
- It is not allowed to initiate more than one profile data collection task on the same device.
- Before using the msprof op and msprof op simulator, ensure that the app functions properly.
msprof op
- Log in to the operating environment, use msprof op to enable onboard operator tuning, and use the optional parameters and the program to be tuned (app [arguments]) to perform onboard tuning. For details about optional parameters, see Table 2. An example command is as follows:
msprof op --output=$home/projects/output $home/projects/MyApp/out/main // --output is optional. $home/projects/MyApp/out/main is the application in use.
- Use the msprof op simulator to enable operator simulation tuning, and use the optional parameters and the program to be tuned (app [arguments]) to perform the tuning.
- Perform operator tuning in either of the following ways:
- Executable file-based method. The following uses add_custom_npu as an example.Example 1:
msprof op ./add_custom_npu
Example 2:msprof op --aic-metrics=<select_metrics> --output=./output_data ./add_custom_npu
- Configuration file .json based on the binary file *.o of the input operator.
msprof op --config=./add_test.json --aic-metrics=<select_metrics> --output=./output_data
- Executable file-based method. The following uses add_custom_npu as an example.
- After the command is executed, a folder named OPPROF_{timestamp}_XXX is generated in the default path or the specified --output directory. When all --aic-metrics are enabled, the structure is as follows:
- Multi-device and multi-operator collection.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
└──OPPROF_{timestamp}_XXX ├── device0 // ID of the Ascend AI Processor used during running └── device1 ├── OpName0 // OpName0 is the name of the collection operator. │ ├── 0 // Sequence in which operators are scheduled. │ │ ├──dump // Folder for storing the process files. The meaning of this parameter is the same as that in single-operator collection. │ │ └──xxx_yyy.csv // xxx indicates the type of the metric generated by an operator, for example, L2Cache. For details about the metric types, see the description of the csv. file in . yyy indicates the timestamp suffix of the .csv file, for example, L2Cache_20240603022812284.csv. │ │ └──visualize_data.bin ├── OpName1 │ ├── 0 │ │ ├──dump │ │ └──xxx_yyy.csv │ │ └──visualize_data.bin ├── OpName2 │ ├── 0 │ │ ├── dump │ │ └── xxx_yyy.csv │ │ └──visualize_data.bin
- Collecting data of multiple operators on a single device
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
└──OPPROF_{timestamp}_XXX ├── OpName0 // OpName0 is the name of the collection operator. │ ├── 0 // Sequence in which operators are scheduled. │ │ ├── dump // Folder for storing the process files. The meaning of this parameter is the same as that in single-operator collection. │ │ └──xxx_yyy.csv //xxx indicates the type of the metric generated by an operator, for example, L2Cache. For details about the metric types, see the description of the csv. file in . yyy indicates the timestamp suffix of the .csv file, for example, L2Cache_20240603022812284.csv. │ │ └──visualize_data.bin │ ├── 1 │ │ ├──dump │ │ └──xxx_yyy.csv │ │ └──visualize_data.bin ├── OpName1 │ ├── 0 │ │ ├── dump │ │ └── xxx_yyy.csv │ │ └── visualize_data.bin
- Collecting data of a single-operator on a single device
1 2 3 4 5 6 7 8 9 10 11
OPPROF_{timestamp}_XXX ├── dump ├── ArithmeticUtilization.csv ├── L2Cache.csv ├── Memory.csv ├── MemoryL0.csv ├── MemoryUB.csv ├── OpBasicInfo.csv ├── PipeUtilization.csv ├── ResourceConflictRatio.csv ├── visualize_data.bin
Table 1 msprof op files File
Description
dump folder
Raw profile data, which can be ignored.
ArithmeticUtilization.csv
Time consumptions and ratios of Cube and Vector instructions. For details, see ArithmeticUtilization (Time Consumptions and Percentages of Cube and Vector Instructions).
L2Cache.csv
L2 cache hit ratio. For details, see L2Cache (L2 Cache Hit Ratio).
Memory.csv
UB/L1/L2/main memory read/write bandwidth rate. For details, see Memory (Memory Read/Write Bandwidth Rate).
MemoryL0.csv
L0A/L0B/L0C memory read/write bandwidth rate. For details, see MemoryL0 (L0 Read/Write Bandwidth Rate).
MemoryUB.csv
MTE/Vector/Scalar UB read/write bandwidth rate. For details, see MemoryUB (UB Read/Write Bandwidth Rate).
PipeUtilization.csv
Time consumptions and ratios of compute units and MTE units. For details, see PipeUtilization (Percentages of Time Taken by Compute Units and MTEs).
ResourceConflictRatio.csv
For details about the percentage of bank groups, bank conflicts, and resource conflicts on the UB in all instructions, see ResourceConflictRatio (Resource Conflict Ratio).
OpBasicInfo.csv
Basic operator information, including the operator names, block dim, and time consumptions. For details, see OpBasicInfo (Basic Operator Information).
visualize_data.bin
File that displays basic operator information, compute unit load, and Roofline bottleneck analysis. For details, see Computing Memory Heatmap and Roofline Bottleneck Analysis Chart.
- Multi-device and multi-operator collection.
- After the visualize_data.bin file is imported into MindStudio Insight, Computing Memory Heatmap, and Roofline Bottleneck Analysis Chart are displayed.
msprof op simulator
The operator tuning tool supports profile data collection and automatic parsing in a simulation environment.
- The simulation function of the msProf tool must run on card 0. If the visible card number is changed, the simulation fails.
- For performance of some operators, call TRACE_START and TRACE_STOP in the single core of the . Add -DASCENDC_TRACE_ON to the compilation configuration file. For details, see adding -DASCENDC_TRACE_ON. Then, the system can generate the pipeline chart. For details about the flow chart content, see Instruction Pipeline Chart.
- You need to add -DASCENDC_TRACE_ON to the compilation configuration file. For details, see the following sample project.
- Log in to the operating environment, utilize the msprof op simulator to enable operator simulation-based tuning, and then use the optional simulation parameters and the application to be optimized (app [arguments]) for tuning. For details about the optional simulation parameters, see Table 3. An example command is as follows:
msprof op simulator --output=$home/projects/output $home/projects/MyApp/out/main // --output is optional. $home/projects/MyApp/out/main is the application in use.
- You can use either of the following methods for operator simulation-based tuning:
- Executable file-based method. The following uses add_custom_npu as an example.
msprof op simulator --output=./output_data ./add_custom_npu
- Method based on the JSON configuration file of the input operator binary file *.o
msprof op simulator --config=./add_test.json --output=./output_data
- Executable file-based method. The following uses add_custom_npu as an example.
- After the command is executed, a folder named OPPROF_{timestamp}_XXX is generated in the specified --output directory. An example of the folder structure is as follows:
- Collecting data of a single-operator
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
OPPROF_{timestamp}_XXX ├── dump └── simulator ├── core0.veccore0 // Store the data file of each core in the core*.veccore* or core*.cubecore* directory. │ ├── core0.veccore0_code_exe.csv │ ├── core0.veccore0_instr_exe.csv │ └── trace.json // Simulation instruction pipeline chart file of the core. ├── core0.veccore1 │ ├── core0.veccore1_code_exe.csv │ ├── core0.veccore1_instr_exe.csv │ └── trace.json ├── core1.veccore0 │ ├── core1.veccore0_code_exe.csv │ ├── core1.veccore0_instr_exe.csv │ └── trace.json ├── ... ├── visualize_data.bin └── trace.json // Simulation instruction pipeline chart files of all cores.
- Collecting data of multiple operators
1 2 3 4 5 6 7 8 9 10 11 12 13 14
└──OPPROF_{timestamp}_XXX ├── OpName1 // OpName1 is the name of the operator to be collected. │ ├── 0 // Sequence in which operators are scheduled. │ │ ├── dump // Folder for storing the process files. The meaning of this parameter is the same as that in single-operator collection. │ │ ├── dump // The content is the same as that in the single-operator simulator folder, but the .csv files in the simulator folder have timestamp suffixes added, for example, core*_code_exe_20240429111143146.csv. │ ├── 1 │ │ ├── dump │ │ └──simulator │ ├── dump // Folder that stores the process files. ├── OpName2 │ ├── 0 │ │ ├── dump │ │ └── simulator │ ├── dump
Table 2 msprof op simulator files File
Description
dump folder
Folder for storing the dump data generated by the original simulation.
simulator folder
NOTE:Folder for storing dump data file analysis results.
core*_code_exe.csv
Time consumed by code lines. The asterisk (*) indicates cores 0 to n, which helps users promptly identify the most time-consuming part of the written code. For details, see Code Line Time Consumption Data File.
core*_instr_exe.csv
Detailed information about code instructions. The asterisk (*) indicates cores 0 to n, which helps users promptly identify the most time-consuming instructions. For details, see Code Instruction Information File.
trace.json
Simulation instruction pipeline chart file, including the subfile of each core and the summary file of all cores. For details, see Instruction Pipeline Chart.
visualize_data.bin
Visualization file of simulation pipelines and simulation hotspot functions. For details, see Instruction Pipeline Chart and Operator Code Hot Spot Map.
- Collecting data of a single-operator
- After the trace.json file is imported to the Chrome browser or MindStudio Insight, Instruction Pipeline Chart is displayed.
- Optional: After the visualize_data.bin file is imported to MindStudio Insight, Instruction Pipeline Chart and Operator Code Hot Spot Map are displayed.