Overview
The msProf performance analysis tool is used to collect and analyze key performance metrics of operators running on Ascend AI Processors. You can efficiently locate software and hardware performance bottlenecks of operators based on the output profile data, thereby enhancing the overall efficiency of operator performance analysis.
Profile data can currently be collected and automatically parsed based on various running modes (onboard or simulation) and file formats (executable files or operator binary .o files).
The msProf tool depends on the msopprof executable file in the CANN package. The API usage in this file is the same as that in msprof op. This file is provided by the CANN package and does not need to be installed separately.
Features
For details about how to use the operator tuning tool, see Tool Usage. MindStudio Insight displays the computing memory heatmap, Roofline bottleneck analysis chart, cache heatmap, communication and computing pipeline chart (MC2 fused operator), instruction pipeline chart, operator code hot spot map, memory channel throughput waveform, and profile data file, which are single-operator tuning capabilities. For details, see Table 1.
- After you enter CTRL+C, the operator execution stops, and the tool generates profile data files based on existing information. If you do not need to generate the file, enter CTRL+C again.
- If the --output parameter is not specified, ensure that users in the group and other groups do not have the write permission on the parent directory of the current path.
Commands
- You are advised to restrict the operation permission on executable files or applications to avoid privilege escalation risks.
- You are not advised to perform high-risk operations (such as deleting files, deleting directories, changing passwords, and running privilege escalation commands) to avoid security risks.
- msprof op mode
Log in to the operating environment and call msprof op optional parameters app [arguments]. For details about optional parameters, see Table 2. An example command is as follows:
msprof op --output=$HOME/projects/output $HOME/projects/MyApp/out/main blockdim 1 // --output is optional. $HOME/projects/MyApp/out/main is the application in use. blockdim 1 is an optional parameter of the application.
Table 2 Options of msprof op Option
Description
Mandatory (Yes/No)
--application
NOTE:This option is currently compatible with ./app [arguments] and will be changed to ./app [arguments] later.
You are advised to use the msprof op [msprof op parameter] ./app to pull data. app is the specified executable file. If no path is specified for app, the current path is used by default.
NOTE:When using ./app, add msprof op parameters before ./app to ensure that the related functions take effect.
Yes. Select either the specified executable file or --config.
--config
This option sets the *.o binary file obtained by operator compilation. It can be set to an absolute path or a relative path. For details, see JSON Configuration File Description.
NOTE:Before operator tuning, you can obtain the operator binary *.o file in either of the following ways:
- Obtain the executable file on the NPU and extract the *.o file from the executable file. For details, see "Kernel Launch".
- The *.o file is automatically generated during operator compilation. For details, see Compiling and Deploying Operators.
- Ensure that users in the group and other groups do not have the write permission on the JSON file specified by --config and the parent directory. In addition, ensure that the owner of the parent directory of the JSON file is the current user.
--kernel-name
This option specifies the name of the operator whose data is to be collected. Fuzzy match using the operator name prefix is supported. If this option is not specified, only data of the first operator scheduled during program running is collected.
NOTE:- This option must be used together with --application. The value contains a maximum of 1024 characters. Only one or more characters in A-Za-z0-9_ are supported.
- If multiple operators need to be collected, use vertical bars (|) to combine them. For example, --kernel-name="add|abs" indicates that operators whose prefixes are add and abs are collected.
- The number of operators to be collected is determined by the value of --launch-count.
- The wildcard (*) can be used to match strings of any length.
No
--launch-count
This option sets the maximum number of operators that can be collected. The value is an integer ranging from 1 to 5000. The default value is 1.
No
--launch-skip-before-match
This option specifies the number of operators that do not require data collection, starting from the first operator to the specified number of operators. Data collection only begins from the operators after the specified number.
NOTE:- The count of this option increases no matter whether --launch-skip-before-match hits the operators specified in kernel-name. In addition, the operator is not collected.
- The value of this option is an integer ranging from 0 to 1000.
No
--aic-metrics
This option enables the collection of operator metrics.
- Enables the collection of operator metrics (ArithmeticUtilization, L2Cache, Memory, MemoryL0, MemoryUB, PipeUtilization, ResourceConflictRatio, and Default). You can select one or more metrics and separate them with commas (,), for example, --aic-metrics=Memory,MemoryL0.
- The default value is Default, indicating that the following metrics (ArithmeticUtilization, L2Cache, Memory, MemoryL0, MemoryUB, PipeUtilization, and ResourceConflictRatio) are collected. Example: --aic-metrics=Default.
- Enables the collection of metrics within a specified code segment on the operator kernel (KernelScale).
KernelScale can be used to tune a specified code segment on the operator kernel. You need to configure --aic-metrics=KernelScale first, and then select one or more operator metrics. Use commas (,) to separate multiple metrics, for example, --aic-metrics=KernelScale,Memory,MemoryL0.
By default, all operator metrics are collected, for example, --aic-metrics=KernelScale.
NOTE:- When specifying the code segment range, ensure that setting is done before and after the corresponding code segment on the operator kernel. For details, see MetricsProfStart and MetricsProfStop APIs.
- This function is supported only by
Atlas A3 training products /Atlas A3 inference products andAtlas A2 training products /Atlas A2 inference products .
- Roofline: enables the generation of Roofline bottleneck analysis charts and displays them in a visual format on MindStudio Insight. Example: --aic-metrics=Roofline. For details, see Roofline Bottleneck Analysis Chart.NOTE:
Roofline is bound with Default. Enabling Roofline simultaneously enables Roofline and Default modes.
- TimelineDetail: enables the generation of instruction pipeline diagrams and operator code hot spot map for visualization, for example, --aic-metrics=TimelineDetail. For details, see Instruction Pipeline Chart and Operator Code Hot Spot Map.NOTE:
- To enable this function, see Configurations of msprof op simulator.
- This function is supported only by
Atlas A2 training products /Atlas A2 inference products andAtlas A3 training products /Atlas A3 inference products . - This function applies only to Third-party framework operator calling: PyTorch framework scenario where single-operator APIs are used to call operators internally.
- This function does not support the collection of level-2 pointer operators, Triton operators, and MC2 fused operators. It cannot be enabled together with --replay-mode=application/range.
- To generate a CSV file or display the Computing Memory Heatmap, enable Default when starting the operator. The following is an example:
msprof op --aic-metrics=TimelineDetail,Default
- Occupancy: enables the generation of the inter-core load analysis chart and displays the chart in a visual format on MindStudio Insight. Example: --aic-metrics=Occupancy. For details, see inter-core load analysis chart.The time consumption, data throughput, and cache hit ratio of each physical core are compared. If the difference between the maximum value and the minimum value is greater than 10%, the load is unbalanced, and the CLI will provide tuning suggestions.NOTE:
This function is supported only by
Atlas A3 training products /Atlas A3 inference products andAtlas A2 training products /Atlas A2 inference products . - MemoryDetail: for example, --aic-metrics=MemoryDetail.
- After this function is enabled, the L2 cache-related functions (the L2 cache-L0A/L0B connection in the Compute Workload Analysis, and L2 cache hit ratio and GM-related data transfer volume in the Cache Heatmap and Operator Code Hot Spot Map) are enabled.
- When dynamic instrumentation is enabled, the active bandwidth of MTE1 and MTE2 in the Cube unit on the AI Core is displayed in the Memory Workload Analysis. If the instrumentation fails, the corresponding fields in the memory workload analysis diagram are displayed as NA, and aic_mte1_active_bw(GB/s) and aic_mte2_active_bw(GB/s) are not displayed in PipeUtilization (Percentages of Time Taken by Compute Units and MTEs).NOTE:
- This function cannot be enabled together with --replay-mode=range.
- This function is supported only by
Atlas A3 training products /Atlas A3 inference products andAtlas A2 training products /Atlas A2 inference products . - MemoryDetail is bound with Default. Enabling MemoryDetail simultaneously enables MemoryDetail and Default modes.
- BasicInfo: enables basic information collection. Only basic operator information is saved to the drive, for example, --aic-metrics=BasicInfo. For details about the saved content, see OpBasicInfo (Basic Operator Information).
- Source: enables the operator code hot spot map, for example, --aic-metrics=Source. For details, see Operator Code Hot Spot Map.NOTE:
- This function is supported only by
Atlas A3 training products /Atlas A3 inference products andAtlas A2 training products /Atlas A2 inference products . - To view the code call stack, add the -g compilation option when compiling the operator. For details, see Adding -g Compilation Option.
- This function cannot be enabled together with --replay-mode=range.
- This function is supported only by
No
--kill
The value can be on or off. The default value is off, indicating that the function is disabled.
If you set --kill to on to enable this function, the application automatically stops after collecting the number of operators specified by --launch-count.
NOTE:- After --kill is set to on, error logs may be generated because the application ends in advance. You can determine whether to use this function.
- For a multi-threaded process, the configuration of the --kill option takes effect only for subprocesses.
- Using this option prevents the last executed MC2 fused operator from properly obtaining the API calling pipeline. For details, see Communication and Computing Pipeline Chart.
- You are advised not to enable this function together with --replay-mode=range. Otherwise, collected operator data may be missing.
No
--mstx
This option determines whether the operator tuning tool enables the mstx API used in the user code program.
The default value is off, indicating that the mstx API is disabled.
If --mstx is set to on, the operator tuning tool enables the mstx API used in the user code program.
Example:
msprof op --mstx=on ./add_custom
NOTE:- Currently, the mstxRangeStartA and mstxRangeEnd APIs are supported to enable the specified range for operator tuning. For details about the parameters, see the and APIs in MindStudio mstx API Reference.
- When used together with --replay-mode=range, the mstxRangeStartA and mstxRangeEnd APIs must be called in pairs and cannot be nested across. The operators contained in each pair of mstx APIs form a replay range. The streams of the operators in the replay range cannot be changed. In addition, the number of operators that can be collected is limited by the number of operator block dims in OpBasicInfo (Basic Operator Information). It is recommended that the number be less than or equal to 50.
No
--mstx-include
This option can be used to enable only the specified mstx APIs when the mstx APIs are enabled in the operator tuning tool.
If this option is not configured, all mstx APIs used in user code are enabled by default.
If this option is configured, --mstx-include enables only the specified mstx APIs. The input of --mstx-include is the message character string transferred when the user calls the mstx function. Multiple character strings are combined using vertical bars (|).
Example:
--mstx=on --mstx-include="hello|hi" //Enable only the mstx APIs whose message parameters are hello and hi in the mstx function passed by the user.
NOTE:- This option cannot be configured independently and must be used together with --mstx.
- A message can contain only A-Z a-z 0-9_ characters. Use vertical bars (|) to combine the messages.
No
--replay-mode
This option specifies the replay mode of operator data collection. The value can be kernel or application or range. The default value is kernel.
- If the value is set to application, the application is replayed for multiple times.NOTE:
In the application mode, separately enabling some aic-metrics may lead to missing data in the visualize_data.bin file. To view complete visualize_data.bin data, you are advised to add Default to --aic-metrics.
- If the value set to kernel, the kernel function of a single operator within the specified collection range is replayed for multiple times.
- If the value is set to range, multiple operators within the specified range are replayed for multiple times as a whole. Multiple ranges can be specified, and ranges are independent of each other.
NOTE:- In the multi-device multi-operator scenario, this option cannot be set to application.
- Range-level replay must be used together with --mstx=on and applies only to
Atlas A3 training products /Atlas A3 inference products andAtlas A2 training products /Atlas A2 inference products . - Range-level replay does not support collection of MC2 and LCCL fused operators and cannot be enabled together with --kill=on, --aic-metrics=MemoryDetail, --aic-metrics=TimelineDetail, and --aic-metrics=Source.
No
--warm-up
When msprof op is used to collect data of some operators, the minimum task time required for processor frequency increase cannot be reached. As a result, the frequency is reduced, which affects the deliverable result. In this case, you can use --warm-up to specify the number of warmup times to improve the running frequency of Ascend AI Processor in advance and make the board data more accurate.
NOTE:- The default value is 5. The value range is [0,500].
- This option does not take effect for the MC2 operator.
No
--output
This option specifies the path for storing the collected profile data. By default, the profile data is stored in the current directory.
NOTE:Ensure that users in the group and other groups do not have the write permission on the parent directory of the path specified by --output. In addition, ensure that the owner of the parent directory of the directory specified by --output is the current user.
No
--dump
This option specifies whether to generate the dump file of the simulator.
The value can be on or off. The default value is off, indicating that the simulator dump file is not generated.
NOTE:- This option is valid only when --aic-metrics=TimelineDetail is used. It takes effect only for
Atlas A2 training products /Atlas A2 inference products andAtlas A3 training products /Atlas A3 inference products . It does not take effect forAtlas inference products . - This option applies only to the single-process scenario and does not support the scenario where two operators run at the same time.
No
--core-id
This option applies to the scenario where operators are evenly distributed. You can use the --core-id option to specify the IDs of some logical cores and parse the simulation data of these cores.
The value range of the core ID is [0,49].
NOTE:- To parse the simulation data of multiple cores, use vertical bars (|) to combine the data. For example, --core-id="0|31" indicates to parse simulation data of cores whose IDs are 0 and 31.
- This option is valid only when --aic-metrics=TimelineDetail is used. It takes effect only for Instruction Pipeline Chart and Operator Code Hot Spot Map and applies only to
Atlas A2 training products /Atlas A2 inference products andAtlas A3 training products /Atlas A3 inference products .
No
-h, --help
This option outputs the help information.
No
- msprof op simulator mode
Log in to the operating environment, utilize the msprof op simulator to enable operator simulation tuning, and then use the optional simulation parameters and the application to be optimized (blockdim 1) for tuning. For details about the optional simulation parameters, see Table 3. An example command is as follows:
msprof op simulator --soc-version=Ascendxxxyy --output=/home/projects/output /home/projects/MyApp/out/main blockdim 1 // --output is an optional parameter, /home/projects/MyApp/out/main indicates the used app, blockdim 1 is an optional parameter of the user application, and xxxyy indicates the processor type.
Table 3 Options of msprof op simulator Option
Description
Mandatory (Yes/No)
--application
NOTE:This option is currently compatible with ./app [arguments] and will be changed to ./app [arguments] later.
You are advised to run msprof op simulator --soc-version=Ascendxxxyy [msprof op simulator parameters] ./app for file pulling. app indicates the specified executable file. If no app path is specified, the current path is used by default. xxxyy indicates the processor type.
NOTE:When using ./app, add msprof op simulator parameters before ./app to ensure that the related functions take effect.
Yes. Select one of the specified executable file, --config, and --export.
--config
This option sets the binary file *.o obtained by operator compilation. It can be set to an absolute path or a relative path. For details, see JSON Configuration File Description.
NOTE:Before operator tuning, you can obtain the operator binary *.o file in either of the following ways:
- Obtain the executable file on the NPU and extract the *.o file from the executable file. For details, see "Kernel Launch".
- The *.o file is automatically generated during operator compilation. For details, see Compiling and Deploying Operators.
- Ensure that users in the group and other groups do not have the write permission on the JSON file specified by --config and the parent directory. In addition, ensure that the owner of the parent directory of the JSON file is the current user.
- You need to use the LD_LIBRARY_PATH environment variable to set the simulator type.
export LD_LIBRARY_PATH=${INSTALL_DIR}/tools/simulator/Ascendxxxyy/lib:$LD_LIBRARY_PATH // xxxyy indicates the processor type.
--export
This option specifies the folder that contains the single-operator simulation result. The simulation result is directly parsed, and the single-core or multi-core instruction pipeline chart of the single-operator is displayed on MindStudio Insight.
NOTE:- The specified folder can store only multi-core data and the operator kernel function file aicore_binary.o. Therefore, you need to manually change the binary file name (*.o) configured in --config to aicore_binary.o.
- If you provide only the dump file, the code line mapping cannot be generated in the instruction pipeline chart. To view the code line, you need to store the operator kernel function file named aicore_binary.o in the dump file.
- Ensure that users in the group and other groups do not have the write permission on the directory specified by --export and all files in the directory specified by --export. In addition, ensure that the owner of the specified directory is the current user.
--kernel-name
This option specifies the name of the operator whose data is to be collected. Fuzzy match using the operator name prefix is supported. If this option is not specified, only data of the first operator scheduled during program running is collected.
NOTE:- This option must be used together with --application. The value contains a maximum of 1024 characters. Only one or more characters in A-Za-z0-9_ are supported.
- If multiple operators need to be collected, use vertical bars (|) to combine them. For example, --kernel-name="add|abs" indicates that operators whose prefixes are add and abs are collected.
- The number of operators to be collected is determined by the value of --launch-count.
- The wildcard (*) can be used to match strings of any length.
No
--launch-count
This option sets the maximum number of operators that can be collected. The value is an integer ranging from 1 to 5000. The default value is 1.
No
--aic-metrics
This option enables the collection of operator performance metrics. The following performance metrics can be collected.- PipeUtilization (collected by default)NOTE:
- PipeUtilization: indicates the computing and transfer instruction pipeline.
- When --aic-metrics=PipeUtilization is configured, ResourceConflictRatio is disabled. That is, only the instruction pipeline is displayed, and the details of synchronization event instructions are not included.
- ResourceConflictRatio (collected by default)NOTE:
- ResourceConflictRatio: displays details about synchronization event instructions.
- For the
Atlas A3 training products /Atlas A3 inference products andAtlas A2 training products /Atlas A2 inference products , the SET_FLAG/WAIT_FLAG instructions are displayed. - For the
Atlas inference products , the set_event/wait_event instructions are displayed.
- For the
- ResourceConflictRatio: displays details about synchronization event instructions.
- PMSampling: enables and visualizes the memory channel throughput waveform, for example, --aic-metrics=PMSampling. For details, see Memory Channel Throughput Waveform.NOTE:
- --core-id does not take effect for the PMSampling parameter. PMSampling parses all cores.
- This function is disabled by default.
No
--core-id
This option applies to the scenario where operators are evenly distributed. You can use the --core-id option to specify the IDs of some logical cores and parse the simulation data of these cores.
The value range of the core ID is [0,49].
NOTE:- To parse the simulation data of multiple cores, use vertical bars (|) to combine the data. For example, --core-id="0|31" indicates to parse simulation data of cores whose IDs are 0 and 31.
- --core-id does not take effect for the PMSampling parameter. PMSampling parses all cores.
No
--timeout
This option is applicable to operators with a large amount of data and repeated calculation. It takes a long time to run such operators. Necessary information can be obtained from some pipeline graphs. You can set the --timeout option to shorten the operator running duration and obtain the necessary pipeline information. The implementation is as follows:
- When the simulation duration reaches the value of --timeout, msProf terminates the simulation and starts parsing. Only part of the simulation data is analyzed. In addition, msProf displays the following information:
1[INFO] The timeout has reached and the application will be forcibly killed.
- If the timeout value is not reached when the process ends normally, the simulation program ends normally and the parsing process starts.
The value is an integer ranging from 1 to 2880, in minutes. Example:
msprof op simulator --soc-version=Ascendxxxyy --timeout=1 ./add_custom // xxxyy indicates the processor type.
No
--mstx
This option determines whether the operator tuning tool enables the mstx APIs used in the user code program.
The default value is off, indicating that the mstx APIs are disabled.
If --mstx is set to on, the operator tuning tool enables the mstx API used in the user code program.
Example:
msprof op simulator --soc-version=Ascendxxxyy --mstx=on ./add_custom // xxxyy indicates the processor type.
No
--mstx-include
This option can be used to enable the specified mstx APIs in the msProf tool.
If this option is not configured, all mstx APIs used in user code are enabled by default.
If this option is configured, --mstx-include enables only the specified mstx APIs. The input of --mstx-include is the message character string transferred when the user calls the mstx function. Multiple character strings must be separated by vertical bars (|).
Example:
--mstx=on --mstx-include="hello|hi" //Enable only the mstx APIs whose message parameters are hello and hi in the mstx function passed by the user.
NOTE:- This option cannot be configured independently and must be used together with --mstx.
- A message can contain only A-Z a-z 0-9_ characters. Use vertical bars (|) to combine the messages.
No
--soc-version
You can use --soc-version or the LD_LIBRARY_PATH environment variable to specify the simulator type. Either of them must be used. The details are as follows:
- --soc-version: specifies the simulator type in --application and --export modes. For details about the value range, see the simulator types in the ${INSTALL_DIR}/tools/simulator directory.
- LD_LIBRARY_PATH environment variable: specifies the simulator type in --config mode or when --soc-version is not used.
export LD_LIBRARY_PATH=${INSTALL_DIR}/tools/simulator/Ascendxxxyy/lib:$LD_LIBRARY_PATHNOTE:Replace ${INSTALL_DIR} with the CANN component directory. For example, if the installation is performed by the root user, the default file storage path is /usr/local/Ascend/cann.
No
--output
This option specifies the path for storing the collected profile data. By default, the profile data is stored in the current directory.
NOTE:Ensure that users in the group and other groups do not have the write permission on the parent directory of the path specified by --output. In addition, ensure that the owner of the parent directory of the directory specified by --output is the current user.
No
--dump
This option specifies whether to generate the dump file of the simulator.
The value can be on or off. The default value is off, indicating that the simulator dump file is not generated.
NOTE:- This option takes effect only for
Atlas A2 training products /Atlas A2 inference products andAtlas A3 training products /Atlas A3 inference products . This option does not take effect forAtlas inference products . The dump files are saved to drives as usual. - This option applies only to the single-process scenario and does not support the scenario where two operators run at the same time.
No
-h, --help
This option outputs the help information.
No
Segment-based Tuning Principles of msprof op
- Run the --launch-skip-before-match command to filter the operator tuning range. The filtering principles are as follows:
- If the range configured by the --launch-skip-before-match command is no collection from the first operator to the specified number of operators, only operators after the specified number are collected.
- If no range is configured, no filtering is performed.
- On the basis of Step 1, run the --mstx command to filter the operator tuning range. The filtering principles are as follows:
- If --mstx is configured, only the operators within the scope of mstxRangeStartA and mstxRangeEnd APIs are collected.
- If no range is configured, no filtering is performed.
- On the basis of Step 2, run the --kernel-name command to filter the operator tuning range. The filtering principles are as follows:
- If --kernel-name has been configured, only operators within the range specified by --kernel-name are collected.
- If --kernel-name is not configured, only the first operator scheduled during program running is collected.
- On the basis of Step 3, run the --aic-metrics command to filter the operator metrics for tuning. The filtering principles are as follows:
- If --aic-metrics has been configured, select the operator performance metrics.
- If --aic-metrics is not configured, operator performance metrics in the Default section are collected by default. Performance metrics in the KernelScale, TimelineDetail, Roofline, and Occupancy sections cannot be collected.
- Perform Step 1 to Step 4 to obtain the actual number of tuned operators and the collection range of metrics.
- With --kill=on, compare the actual number of tuned operators with the value of --launch-count to determine whether to automatically stop the program.
If the number of tuned operators is less than or equal to the value of --launch-count, go to the next step. Otherwise, the program automatically stops when the number of tuned operators reaches the value specified by --launch-count.
Call Scenarios
- Kernel launch operator development: kernel launch
- For details about kernel launch, see "Kernel Launch Operator Development".
- In the kernel launch scenario, configure the prerequisites and then run the following command:
msprof op simulator --soc-version=Ascendxxxyy ./main // main indicates the name of the user operator program, including the program name of the operator to be tuned. xxxyy indicates the type of the processor used by the user.
- Optional: If you need to perform simulation tuning on an operator that runs on the board without recompilation, perform the following steps:
- Create a soft link named libruntime.so that points to libruntime_camodel.so in any directory.
ln -s /{simulator_path}/lib/libruntime_camodel.so /{so_path}/libruntime.so //For example, if the CANN package is installed in the default path of the root user, simulator_path is /usr/local/Ascend/cann/tools/simulator/Ascendxxxyy. - Add the parent directory of the created soft link to the environment variable LD_LIBRARY_PATH.
export LD_LIBRARY_PATH={so_path}:$LD_LIBRARY_PATH
- Create a soft link named libruntime.so that points to libruntime_camodel.so in any directory.
- Project-based operator development: single-operator API calling
- For details about single-operator API call, see "Single-Operator API Calling".
- In the single-operator API execution scenario, configure the prerequisites and then run the following command:
msprof op simulator --soc-version=Ascendxxxyy ./main // main indicates the name of the user operator program, including the program name of the operator to be tuned. xxxyy indicates the type of the processor used by the user.
- AI framework operator adaptation: PyTorch framework
- When the msProf tool is used for simulated tuning of the operators in the PyTorch script on
Atlas Inference Series Product , only the Kernels-based operator package calling mode is supported. You need to install the binary kernels operator package by referring to "Installing CANN", modify the script entry file, for example, main.py, and add the information in bold under import torch_npu to ensure that the operators in the kernels operator package are used.import torch import torch_npu torch_npu.npu.set_compile_mode(jit_compile=False) ......
- For details about single-operator execution in the PyTorch framework, see Adapting OpPlugin to a Single Operator.
- When the PyTorch framework is used to call a single-operator, configure the prerequisites and then run the following command:
msprof op simulator --soc-version=Ascendxxxyy python a.py // a.py indicates the name of the user operator program, including the program name of the operator to be tuned. xxxyy indicates the type of the processor used by the user.
- When the msProf tool is used for simulated tuning of the operators in the PyTorch script on
- Triton operator development: Triton operator calling