Overview
The msProf performance analysis tool is used to collect and analyze key performance metrics of operators running on Ascend AI Processors. You can efficiently locate software and hardware performance bottlenecks of operators based on the output profile data, thereby enhancing the overall efficiency of operator performance analysis.
Profile data can currently be collected and automatically parsed based on various running modes (onboard or simulation) and file formats (executable files or operator binary .o files).
- The msProf tool depends on the msopprof executable file in the CANN package. The API usage in this file is the same as that in msprof op. This file is provided by the CANN package and does not need to be installed separately.
- The msProf tool does not support the detection of multi-thread operators.
- The simulation function of the msProf tool must run on card 0. If the visible card number is changed, the simulation fails.
Features
Tool Usage describes how to use the operator tuning tool. MindStudio Insight displays the single-operator tuning capabilities such as the computing memory hot spot map, instruction pipeline chart, and operator instruction hot spot map. For details, see Table 1.
- After you enter CTRL+C, the operator execution stops, and the tool generates a profile data file based on existing information. If you do not need to generate the file, enter CTRL+C again.
- If the --output parameter is not specified, ensure that users in the group and other groups do not have the write permission on the parent directory of the current path.
Commands
- You are advised to restrict the operation permission on executable files or applications to avoid privilege escalation risks.
- You are not advised to perform high-risk operations (such as deleting files, deleting directories, changing passwords, and running privilege escalation commands) to avoid security risks.
- msprof op mode
Log in to the operating environment and call msprof op optional parameters app [arguments]. For details about optional parameters, see Table 2. An example command is as follows:
msprof op --output=$home/projects/output $home/projects/MyApp/out/main blockdim 1 // --output is optional. $home/projects/MyApp/out/main is the used app. blockdim 1 is the optional parameter of the user app.
Table 2 Options of msprof op Option
Description
Mandatory (Yes/No)
--application
NOTE:This option is currently compatible with ./app [arguments] and will be changed to ./app [arguments] later.
You are advised to use the msprof op [msprof op parameter] ./app to pull data. app is the specified executable file. If no path is specified for app, the current path is used by default.
Yes. Select either the specified executable file or --config.
--config
Set this parameter to the *.o binary file generated after operator building. The path can be either absolute or relative. For details about , see msProf JSON Configuration File Description.
NOTE:Before operator on-board or simulation tuning, you can obtain the operator binary *. o file in either of the following ways:
- Obtain the NPU executable file ascendc_kernels_bbit and extract the * .o file from the executable file. .
- Refer to Compiling and Deploying Operators. The * .o file is automatically generated during operator building.
- Ensure that users in the group and other groups do not have the write permission on the JSON file specified by --config and the parent directory. In addition, ensure that the owner of the parent directory of the JSON file is the current user.
--kernel-name
This option specifies the name of the operator whose data is to be collected. Fuzzy match using the operator name prefix is supported. If this option is not specified, only data of the first operator scheduled during program running is collected.
NOTE:- This option must be used together with --application. The value contains a maximum of 1024 characters. Only one or more characters in A-Za-z0-9_ are supported.
- If multiple operators need to be collected, use vertical bars (|) to combine them. For example, --kernel-name="add|abs" indicates that operators whose prefixes are add and abs are collected.
- The number of operators to be collected is determined by the value of --launch-count.
No
--launch-count
This option sets the maximum number of operators that can be collected. The value is an integer ranging from 1 to 100. The default value is 1.
No
--launch-skip-before-match
This option specifies the number of operators that do not require data collection, starting from the first operator to the specified number of operators. Data collection only begins from the operators after the specified number.
NOTE:- The count of this option increases no matter whether --launch-skip-before-match hits the operators specified in kernel-name. In addition, the operator is not collected.
- The value of this option is an integer ranging from 0 to 1000.
No
--aic-metrics
This option enables the collection of operator metrics.
- Enables the collection of operator metrics (ArithmeticUtilization, L2Cache, Memory, MemoryL0, MemoryUB, PipeUtilization, ResourceConflictRatio, and Default). You can select one or more metrics and separate them with commas (,), for example, --aic-metrics=Memory,MemoryL0.
By default, the following metrics (ArithmeticUtilization, L2Cache, Memory, MemoryL0, MemoryUB, PipeUtilization, and ResourceConflictRatio) are collected.
- Roofline: enables the generation of Roofline bottleneck analysis charts and displays them in a visual format on MindStudio Insight. Example: --aic-metrics=Roofline. For details, see Roofline Bottleneck Analysis Chart.
- Occupancy: enables the generation of the inter-core load analysis chart and displays the chart in a visual format on MindStudio Insight. Example: --aic-metrics=Occupancy. For details, see inter-core load analysis chart.The time consumption, data throughput, and cache hit ratio of each physical core are compared. If the difference between the maximum value and the minimum value is greater than 10%, the load is unbalanced, and the CLI will provide tuning suggestions.NOTE:
This function is supported only by .
No
--kill
The value can be on or off. The default value is off, indicating that this function is disabled.
If you set --kill to on to enable this function, the application automatically stops after collecting the number of operators specified by --launch-count.
NOTE:- After --kill is set to on, error logs may be generated because the application ends in advance. You can determine whether to use this function.
- For a multi-threaded process, the configuration of the --kill option takes effect only for subprocesses.
No
--mstx
This option determines whether the operator tuning tool enables the mstx API used in the user code program.
The default value is off, indicating that the mstx API is disabled.
If --mstx is set to on, the operator tuning tool enables the mstx API used in the user code program.
Example:
msprof op --mstx=on ./add_custom
NOTE:Currently, the mstxRangeStartA and mstxRangeEnd APIs are supported to enable the specified range for operator tuning. For details about the parameters, see the mstxRangeStartA and mstxRangeEnd APIs.
No
--mstx-include
This option can be used to enable only the specified mstx APIs when the mstx APIs are enabled in the operator tuning tool.
If this option is not configured, all mstx APIs used in user code are enabled by default.
If this option is configured, --mstx-include enables only the specified mstx APIs. The input of --mstx-include is the message character string transferred when the user calls the mstx function. Multiple character strings are combined using vertical bars (|).
Example:
--mstx=on --mstx-include="hello|hi" //Enable only the mstx APIs whose message parameters are hello and hi in the mstx function passed by the user.
NOTE:- This option cannot be configured independently and must be used together with --mstx.
- A message can contain only A-Z a-z 0-9_ characters. Use vertical bars (|) to combine the messages.
No
--output
This option specifies the path for storing the collected profile data. By default, the profile data is stored in the current directory.
NOTE:Ensure that users in the group and other groups do not have the write permission on the parent directory of the path specified by --output. In addition, ensure that the owner of the parent directory of the directory specified by --output is the current user.
No
--help
This option outputs the help information.
No
- msprof op simulator mode
Log in to the operating environment, utilize the msprof op simulator to enable operator simulation-based tuning, and then use the optional simulation parameters and the application to be optimized (blockdim 1) for tuning. For details about the optional simulation parameters, see Table 3. An example command is as follows:
msprof op simulator --output=/home/projects/output /home/projects/MyApp/out/main blockdim 1 // --output is optional. /home/projects/MyApp/out/main is the application in use. blockdim 1 is an optional parameter of the application.
Table 3 Options of msprof op simulator Option
Description
Mandatory (Yes/No)
--application
NOTE:This option is currently compatible with ./app [arguments] and will be changed to ./app [arguments] later.
You are advised to use the msprof op simulator [msprof op simulator parameter] ./app to pull data. app is the executable file specified by the user. If no path is specified for app, the current path is used by default.
Yes. Select one of the specified executable file, --config, and --export.
--config
Set this parameter to the *.o binary file generated after operator building. It can be set to an absolute path or a relative path. For details, see msProf JSON Configuration File Description.
NOTE:Before operator on-board or simulation tuning, you can obtain the operator binary *. o file in either of the following ways:
- Obtain the NPU executable file ascendc_kernels_bbit and extract the * .o file from the executable file. .
- Refer to Compiling and Deploying Operators. The * .o file is automatically generated during operator building.
- Ensure that users in the group and other groups do not have the write permission on the JSON file specified by --config and the parent directory. In addition, ensure that the owner of the parent directory of the JSON file is the current user.
--export
This option specifies the folder that contains the single-operator simulation result. The simulation result is directly parsed, and the single-core or multi-core instruction pipeline chart of the single-operator is displayed on MindStudio Insight.
NOTE:- The specified folder can store only multi-core data and the operator kernel function file aicore_binary.o. Therefore, you need to manually change the binary file name (*.o) configured in --config to aicore_binary.o.
- If you provide only the dump file, the code line mapping cannot be generated in the instruction pipeline chart. To view the code line, you need to store the operator kernel function file named aicore_binary.o in the dump file.
- Ensure that users in the group and other groups do not have the write permission on the directory specified by --export and all files in the directory specified by --export. In addition, ensure that the owner of the specified directory is the current user.
--kernel-name
This option specifies the name of the operator whose data is to be collected. Fuzzy match using the operator name prefix is supported. If this option is not specified, only data of the first operator scheduled during program running is collected.
NOTE:- This option must be used together with --application. The value contains a maximum of 1024 characters. Only one or more characters in A-Za-z0-9_ are supported.
- If multiple operators need to be collected, use vertical bars (|) to combine them. For example, --kernel-name="add|abs" indicates that operators whose prefixes are add and abs are collected.
- The number of operators to be collected is determined by the value of --launch-count.
No
--launch-count
This option sets the maximum number of operators that can be collected. The value is an integer ranging from 1 to 100. The default value is 1.
No
--aic-metrics
This option enables the collection of operator performance metrics. The following performance metrics are supported. By default, all of them are collected.
- PipeUtilization
- ResourceConflictRatio
NOTE:- PipeUtilization (mandatory): time consumption ratios of compute units and MTE units. Example: --aic-metrics=PipeUtilization.
- --aic-metrics=PipeUtilization: ResourceConflictRatio is disabled.
No
--mstx
This option determines whether the operator tuning tool enables the mstx APIs used in the user code program.
The default value is off, indicating that the mstx APIs are disabled.
If --mstx is set to on, the operator tuning tool enables the mstx API used in the user code program.
Example:
msprof op simulator --mstx=on ./add_custom
NOTE:Currently, the mstxRangeStartA and mstxRangeEnd APIs are supported to enable the specified range for operator tuning. For details about the parameters, see the mstxRangeStartA and mstxRangeEnd APIs.
No
--mstx-include
This option can be used to enable the specified mstx APIs in the msProf tool.
If this option is not configured, all mstx APIs used in user code are enabled by default.
If this option is configured, --mstx-include enables only the specified mstx APIs. The input of --mstx-include is the message character string transferred when the user calls the mstx function. Multiple character strings must be separated by vertical bars (|).
Example:
--mstx=on --mstx-include="hello|hi" //Enable only the mstx APIs whose message parameters are hello and hi in the mstx function passed by the user.
NOTE:- This option cannot be configured independently and must be used together with --mstx.
- A message can contain only A-Z a-z 0-9_ characters. Use vertical bars (|) to combine the messages.
No
--soc-version
This option specifies the simulator type in --application and --export modes. For details about the value range, see the simulator types in ${INSTALL_DIR}/tools/simulator.
If this option is not configured, you need to use the LD_LIBRARY_PATH environment variable to set the simulator type.export LD_LIBRARY_PATH=${INSTALL_DIR}/tools/simulator/Ascendxxxyy/lib:$LD_LIBRARY_PATHNOTE:Replace ${INSTALL_DIR} with the actual CANN component directory. If the Ascend-CANN-Toolkit package is installed as the root user, the CANN component directory is /usr/local/Ascend/ascend-toolkit/latest.
No
--output
This option specifies the path for storing the collected profile data. By default, the profile data is stored in the current directory.
NOTE:Ensure that users in the group and other groups do not have the write permission on the parent directory of the path specified by --output. In addition, ensure that the owner of the parent directory of the directory specified by --output is the current user.
No
--help
This option outputs the help information.
No
Segment-based Tuning Principles of msprof op
- Run the --launch-skip-before-match command to filter the operator tuning range. The filtering principles are as follows:
- If the range configured by the --launch-skip-before-match command is no collection from the first operator to the specified number of operators, only operators after the specified number are collected.
- If no range is configured, no filtering is performed.
- On the basis of Step 1, run the --mstx command to filter the operator tuning range. The filtering principles are as follows:
- If --mstx is configured, only the operators within the scope of mstxRangeStartA and mstxRangeEnd APIs are collected.
- If no range is configured, no filtering is performed.
- On the basis of Step 2, run the --kernel-name command to filter the operator tuning range. The filtering principles are as follows:
- If --kernel-name has been configured, only operators within the range specified by --kernel-name are collected.
- If --kernel-name is not configured, only the first operator scheduled during program running is collected.
- On the basis of Step 3, run the --aic-metrics command to filter the operator metrics for tuning. The filtering principles are as follows:
- If --aic-metrics has been configured, select the operator performance metrics.
- If --aic-metrics is not configured, the default operator metrics are collected. The operator metrics Roofline, and Occupancy cannot be collected.
- Perform Step 1 to Step 4 to obtain the actual number of tuned operators and the collection range of metrics.
- With --kill=on, compare the actual number of tuned operators with the value of --launch-count to determine whether to automatically stop the program.
If the number of tuned operators is less than or equal to the value of --launch-count, go to the next step. Otherwise, the program automatically stops when the number of tuned operators reaches the value specified by --launch-count.
Call Scenarios
- Kernel launch.
- For details about kernel launch, see Operator Development Based on Kernel Launch.
- In the kernel launch scenario, configure the prerequisites and then run the following command:
msprof op simulator ./main // main indicates the name of the user operator program, including the program name of the operator to be tuned.
- Optional: If you need to perform simulation-based tuning on an operator that runs on the board without recompilation, perform the following steps:
- Create a soft link named libruntime.so that points to libruntime_camodel.so in any directory.
ln -s /{simulator_path}/lib/libruntime_camodel.so /{so_path}/libruntime.so // For example, if the default path of the root user is used to install CANN package, simulator_path is /usr/local/Ascend/ascend-toolkit/latest/tools/simulator/ascendxxxyy. - Add the parent directory of the created soft link to the environment variable LD_LIBRARY_PATH.
export LD_LIBRARY_PATH={so_path}:$LD_LIBRARY_PATH
- Create a soft link named libruntime.so that points to libruntime_camodel.so in any directory.
- AscendCL single-operator calling: single-operator API execution scenario
- For details about single-operator API execution, see Single-Operator API Calling.
- In the single-operator API execution scenario, configure the prerequisites and then run the following command:
msprof op simulator ./main // main indicates the name of the user operator program, including the program name of the operator to be tuned.
- Third-party framework operator calling: PyTorch framework scenario
- For details about single-operator calling through the PyTorch framework, see "Huawei-Developed Ascend Plugin > OpPlugin Development for Single-Operator Adaptation" in Ascend Extension for PyTorch Suites and Third-Party Libraries .
- When the PyTorch framework is used to call a single-operator, configure the prerequisites and then run the following command:
msprof op simulator python a.py // a.py indicates the name of the user operator program, including the program name of the operator to be tuned.