Overview
The msProf performance analysis tool is used to collect and analyze key performance metrics of operators running on Ascend AI Processors. You can efficiently locate software and hardware performance bottlenecks of operators based on the output profile data, thereby enhancing the overall efficiency of operator performance analysis.
Profile data can currently be collected and automatically parsed based on various running modes (onboard or simulation) and file formats (executable files or operator binary .o files).
- The msProf tool depends on the msopprof executable file in the CANN package. The API usage in this file is the same as that in msprof op. This file is provided by the CANN package and does not need to be installed separately.
- The simulation function of the msProf tool must run on card 0. If the visible card number is changed, the simulation fails.
Features
For details about how to use the operator tuning tool, see Tool Usage.
- After you enter CTRL+C, the operator execution stops, and the tool generates a profile data file based on existing information. If you do not need to generate the file, enter CTRL+C again.
- If the --output parameter is not specified, ensure that users in the group and other groups do not have the write permission on the parent directory of the current path.
Commands
- You are advised to restrict the operation permission on executable files or applications to avoid privilege escalation risks.
- You are not advised to perform high-risk operations (such as deleting files, deleting directories, changing passwords, and running privilege escalation commands) to avoid security risks.
- msprof op mode
Log in to the operating environment and call msprof op optional parameters app [arguments]. For details about optional parameters, see Table 1. An example command is as follows:
msprof op --output=$home/projects/output $home/projects/MyApp/out/main blockdim 1 // --output is optional. $home/projects/MyApp/out/main is the used app. blockdim 1 is the optional parameter of the user app.
Table 1 Options of msprof op Option
Description
Mandatory (Yes/No)
--application
NOTE:This option is currently compatible with ./app [arguments] and will be changed to ./app [arguments] later.
You are advised to use the msprof op [msprof op parameter] ./app to pull data. app is the specified executable file. If no path is specified for app, the current path is used by default.
Yes. Select either the specified executable file or --config.
--config
Set this parameter to the *.o binary file generated after operator building. The path can be either absolute or relative. For details about , see msProf JSON Configuration File Description.
NOTE:Before operator on-board or simulation tuning, you can obtain the operator binary *. o file in either of the following ways:
- Obtain the NPU executable file ascendc_kernels_bbit and extract the * .o file from the executable file. .
- Refer to Compiling and Deploying Operators. The * .o file is automatically generated during operator building.
- Ensure that users in the group and other groups do not have the write permission on the JSON file specified by --config and the parent directory. In addition, ensure that the owner of the parent directory of the JSON file is the current user.
--kernel-name
This option specifies the name of the operator whose data is to be collected. Fuzzy match using the operator name prefix is supported. If this option is not specified, only data of the first operator scheduled during program running is collected.
NOTE:- This option must be used together with --application. The value contains a maximum of 1024 characters. Only one or more characters in A-Za-z0-9_ are supported.
- If multiple operators need to be collected, use vertical bars (|) to combine them. For example, --kernel-name="add|abs" indicates that operators whose prefixes are add and abs are collected.
- The number of operators to be collected is determined by the value of --launch-count.
No
--launch-count
This option sets the maximum number of operators that can be collected. The value is an integer ranging from 1 to 100. The default value is 1.
No
--launch-skip-before-match
This option specifies the number of operators that do not require data collection, starting from the first operator to the specified number of operators. Data collection only begins from the operators after the specified number.
NOTE:- The count of this option increases no matter whether --launch-skip-before-match hits the operators specified in kernel-name. In addition, the operator is not collected.
- The value of this option is an integer ranging from 0 to 1000.
No
--aic-metrics
This option enables the collection of operator metrics.
- Enables the collection of operator metrics (ArithmeticUtilization, L2Cache, Memory, MemoryL0, MemoryUB, PipeUtilization, ResourceConflictRatio, and Default). You can select one or more metrics and separate them with commas (,), for example, --aic-metrics=Memory,MemoryL0.
By default, the following metrics (ArithmeticUtilization, L2Cache, Memory, MemoryL0, MemoryUB, PipeUtilization, and ResourceConflictRatio) are collected.
No
--kill
The value can be on or off. The default value is off, indicating that this function is disabled.
If you set --kill to on to enable this function, the application automatically stops after collecting the number of operators specified by --launch-count.
NOTE:- After --kill is set to on, error logs may be generated because the application ends in advance. You can determine whether to use this function.
- For a multi-threaded process, the configuration of the --kill option takes effect only for subprocesses.
No
--mstx
This option determines whether the operator tuning tool enables the mstx API used in the user code program.
The default value is off, indicating that the mstx API is disabled.
If --mstx is set to on, the operator tuning tool enables the mstx API used in the user code program.
Example:
msprof op --mstx=on ./add_custom
NOTE:Currently, the mstxRangeStartA and mstxRangeEnd APIs are supported to enable the specified range for operator tuning. For details about the parameters, see the mstxRangeStartA and mstxRangeEnd APIs.
No
--mstx-include
This option can be used to enable only the specified mstx APIs when the mstx APIs are enabled in the operator tuning tool.
If this option is not configured, all mstx APIs used in user code are enabled by default.
If this option is configured, --mstx-include enables only the specified mstx APIs. The input of --mstx-include is the message character string transferred when the user calls the mstx function. Multiple character strings are combined using vertical bars (|).
Example:
--mstx=on --mstx-include="hello|hi" //Enable only the mstx APIs whose message parameters are hello and hi in the mstx function passed by the user.
NOTE:- This option cannot be configured independently and must be used together with --mstx.
- A message can contain only A-Z a-z 0-9_ characters. Use vertical bars (|) to combine the messages.
No
--output
This option specifies the path for storing the collected profile data. By default, the profile data is stored in the current directory.
NOTE:Ensure that users in the group and other groups do not have the write permission on the parent directory of the path specified by --output. In addition, ensure that the owner of the parent directory of the directory specified by --output is the current user.
No
--help
This option outputs the help information.
No
- msprof op simulator mode
Log in to the operating environment, utilize the msprof op simulator to enable operator simulation-based tuning, and then use the optional simulation parameters and the application to be optimized (blockdim 1) for tuning. For details about the optional simulation parameters, see Table 2. An example command is as follows:
msprof op simulator --soc-version=Ascendxxxyy --output =/home/projects/output /home/projects/MyApp/out/main blockdim 1 // --output indicates the application to be used. /home/projects/MyApp/out/main indicates the name of the application. blockdim 1 indicates the name of the application, xxxyy indicates the type of the chip used by the user.
Table 2 Options of msprof op simulator Option
Description
Mandatory (Yes/No)
--application
NOTE:This option is currently compatible with ./app [arguments] and will be changed to ./app [arguments] later.
You are advised to use msprof op simulator --soc-version=Ascendxxxyy [msprof op simulator parameter] ./app to pull the file. app indicates the executable file specified by the user. If the application does not specify a path, the current path is used by default. xxxyy indicates the processor type.
Yes. Select one of the specified executable file, --config, and --export.
--config
Set this parameter to the *.o binary file generated after operator building. It can be set to an absolute path or a relative path. For details, see msProf JSON Configuration File Description.
NOTE:Before operator on-board or simulation tuning, you can obtain the operator binary *. o file in either of the following ways:
- Obtain the NPU executable file ascendc_kernels_bbit and extract the * .o file from the executable file. .
- Refer to Compiling and Deploying Operators. The * .o file is automatically generated during operator building.
- Ensure that users in the group and other groups do not have the write permission on the JSON file specified by --config and the parent directory. In addition, ensure that the owner of the parent directory of the JSON file is the current user.
- You need to use the LD_LIBRARY_PATH environment variable to set the emulator type.
exportLD_LIBRARY_PATH=${INSTALL_DIR}/tools/simulator/Ascendxxxyy/lib:$LD_LIBRARY_PATH // xxxyy indicates the type of the chip used by the user.
--export
Specifies the folder that contains the single-operator simulation result. The simulation result is directly parsed.
NOTE:- The specified folder can store only multi-core data and the operator kernel function file aicore_binary.o. Therefore, you need to manually change the binary file name (*.o) configured in --config to aicore_binary.o.
- If you provide only the dump file, the code line mapping cannot be generated in the instruction pipeline chart. To view the code line, you need to store the operator kernel function file named aicore_binary.o in the dump file.
- Ensure that users in the group and other groups do not have the write permission on the directory specified by --export and all files in the directory specified by --export. In addition, ensure that the owner of the specified directory is the current user.
--kernel-name
This option specifies the name of the operator whose data is to be collected. Fuzzy match using the operator name prefix is supported. If this option is not specified, only data of the first operator scheduled during program running is collected.
NOTE:- This option must be used together with --application. The value contains a maximum of 1024 characters. Only one or more characters in A-Za-z0-9_ are supported.
- If multiple operators need to be collected, use vertical bars (|) to combine them. For example, --kernel-name="add|abs" indicates that operators whose prefixes are add and abs are collected.
- The number of operators to be collected is determined by the value of --launch-count.
No
--launch-count
This option sets the maximum number of operators that can be collected. The value is an integer ranging from 1 to 100. The default value is 1.
No
--aic-metrics
This option enables the collection of operator performance metrics. The following performance metrics are supported. By default, all of them are collected.
- PipeUtilization
- ResourceConflictRatio
NOTE:- PipeUtilization (mandatory): time consumption ratios of compute units and MTE units. Example: --aic-metrics=PipeUtilization.
- --aic-metrics=PipeUtilization: ResourceConflictRatio is disabled.
- ResourceConflictRatio indicates the resource conflict ratio. The SET_FLAG and WAIT_FLAG commands can be displayed and apply only to .
No
--mstx
This option determines whether the operator tuning tool enables the mstx APIs used in the user code program.
The default value is off, indicating that the mstx APIs are disabled.
If --mstx is set to on, the operator tuning tool enables the mstx API used in the user code program.
Example:
msprof op simulator --soc-version=Ascendxxxyy --mstx=on ./add_custom // xxxyy indicates the processor type.
NOTE:Currently, the mstxRangeStartA and mstxRangeEnd APIs are supported to enable the specified range for operator tuning. For details about the parameters, see the mstxRangeStartA and mstxRangeEnd APIs.
No
--mstx-include
This option can be used to enable the specified mstx APIs in the msProf tool.
If this option is not configured, all mstx APIs used in user code are enabled by default.
If this option is configured, --mstx-include enables only the specified mstx APIs. The input of --mstx-include is the message character string transferred when the user calls the mstx function. Multiple character strings must be separated by vertical bars (|).
Example:
--mstx=on --mstx-include="hello|hi" //Enable only the mstx APIs whose message parameters are hello and hi in the mstx function passed by the user.
NOTE:- This option cannot be configured independently and must be used together with --mstx.
- A message can contain only A-Z a-z 0-9_ characters. Use vertical bars (|) to combine the messages.
No
--soc-version
You can specify the simulator type by using --soc-version or setting the LD_LIBRARY_PATH environment variable. Either of them must be specified. The details are as follows:
- --soc-version: specifies the simulator type in --application and --export modes. For details about the value range, see the simulator type in the ${INSTALL_DIR} /tools/simulator directory.
- Set the LD_LIBRARY_PATH environment variable to specify the emulator type in --config mode or when --soc-version is not used.
export LD_LIBRARY_PATH=${INSTALL_DIR}/tools/simulator/Ascendxxxyy/lib:$LD_LIBRARY_PATHNOTE:Replace ${INSTALL_DIR} with the actual CANN component directory. If the Ascend-CANN-Toolkit package is installed as the root user, the CANN component directory is /usr/local/Ascend/ascend-toolkit/latest.
No
--output
This option specifies the path for storing the collected profile data. By default, the profile data is stored in the current directory.
NOTE:Ensure that users in the group and other groups do not have the write permission on the parent directory of the path specified by --output. In addition, ensure that the owner of the parent directory of the directory specified by --output is the current user.
No
--help
This option outputs the help information.
No
Segment-based Tuning Principles of msprof op
- Run the --launch-skip-before-match command to filter the operator tuning range. The filtering principles are as follows:
- If the range configured by the --launch-skip-before-match command is no collection from the first operator to the specified number of operators, only operators after the specified number are collected.
- If no range is configured, no filtering is performed.
- On the basis of Step 1, run the --mstx command to filter the operator tuning range. The filtering principles are as follows:
- If --mstx is configured, only the operators within the scope of mstxRangeStartA and mstxRangeEnd APIs are collected.
- If no range is configured, no filtering is performed.
- On the basis of Step 2, run the --kernel-name command to filter the operator tuning range. The filtering principles are as follows:
- If --kernel-name has been configured, only operators within the range specified by --kernel-name are collected.
- If --kernel-name is not configured, only the first operator scheduled during program running is collected.
- On the basis of Step 3, run the --aic-metrics command to filter the operator metrics for tuning. The filtering principles are as follows:
- If --aic-metrics has been configured, select the operator performance metrics.
- If --aic-metrics is not set, the operator performance metrics of the default part are collected by default.
- Perform Step 1 to Step 4 to obtain the actual number of tuned operators and the collection range of metrics.
- With --kill=on, compare the actual number of tuned operators with the value of --launch-count to determine whether to automatically stop the program.
If the number of tuned operators is less than or equal to the value of --launch-count, go to the next step. Otherwise, the program automatically stops when the number of tuned operators reaches the value specified by --launch-count.
Call Scenarios
- Kernel launch.
- For details about kernel launch, see Operator Development Based on Kernel Launch.
- In the kernel launch scenario, configure the prerequisites and then run the following command:
msprof op simulator --soc-version=Ascendxxxyy ./main // main is the name of the user operator program, including the program name of the operator to be tuned, xxxyy indicates the type of the chip used by the user.
- Optional: If you need to perform simulation-based tuning on an operator that runs on the board without recompilation, perform the following steps:
- Create a soft link named libruntime.so that points to libruntime_camodel.so in any directory.
ln -s /{simulator_path}/lib/libruntime_camodel.so /{so_path}/libruntime.so // For example, if the default path of the root user is used to install CANN package, simulator_path is /usr/local/Ascend/ascend-toolkit/latest/tools/simulator/ascendxxxyy. - Add the parent directory of the created soft link to the environment variable LD_LIBRARY_PATH.
export LD_LIBRARY_PATH={so_path}:$LD_LIBRARY_PATH
- Create a soft link named libruntime.so that points to libruntime_camodel.so in any directory.
- AscendCL single-operator calling: single-operator API execution scenario
- For details about single-operator API execution, see Single-Operator API Calling.
- In the single-operator API execution scenario, configure the prerequisites and then run the following command:
msprof op simulator --soc-version=Ascendxxxyy ./main // main indicates the name of the operator program to be tuned, and xxxyy indicates the processor type.
- Third-party framework operator calling: PyTorch framework scenario
- For details about single-operator calling through the PyTorch framework, see .
- When the PyTorch framework is used to call a single-operator, configure the prerequisites and then run the following command:
msprof op simulator --soc-version=Ascendxxxyy python a.py // a.py indicates the name of the user operator program, including the program name of the operator to be tuned, and xxxyy indicates the processor type.