Profile Data Collection
Prerequisites
- Currently, MindStudio IDE does not support data collection in cluster scenarios. You can use the Import Result function to import the parent directory of PROF_XXX to display the collected cluster profile data.
For details about profile data collection in cluster scenarios, see "Appendixes" > "Performance Analysis in Cluster Training Scenarios" in the Profiling Instructions.
- To collect data using the training project, add the configuration information of the PROFILING_OPTIONS field to the environment variable script file env_*.sh of the training project. The following is an example:
export PROFILING_MODE=true export PROFILING_OPTIONS='{"output":"/tmp/profiling","training_trace":"on","task_trace":"on","fp_point":"","bp_point":"","aic_metrics":"MemoryL0"}'The path specified by output stores the profile data collected on the server by Profiling, which will be copied to the path specified by Project Location and a .json result file is generated for MindStudio IDE to display.
The PROFILING_OPTIONS field is used to configure profiling items. Select required options as required. For details about the options for adding profiling configurations to the training project script, see "Other Collection Modes" > "Collecting Data Using TensorFlow Framework Interfaces" > "Profiling Options" in Profiling Instructions.
Procedure
- In the navigation bar on the left of the welcome page, click Projects, and select and open a built project.
- Choose from the menu bar. The system analysis project page is displayed.Figure 1 System analysis project page
- On the system analysis project page, click New Project on the welcome page or the
icon in the upper left corner. The profiling configuration window is displayed, as shown in Figure 2.Set Project Name and Project Location under Project Properties. Click Next.Table 1 Project Properties parameters Parameter
Description
Project Name
Profiling project name customized by the user.
After the configuration, a folder named after the project name is automatically created in the directory specified by Project Location. The collected raw profile data directory PROF_XXX and the data parsing result .json file are stored in this folder.
NOTE:The parsing result .json file is named in the following format: report_{timestamp}_{device_id}_{model_id}_{iter_id}.json, in which {device_id} indicates the device ID, {model_id} indicates the model ID, and {iter_id} indicates the ID of an iteration.
Project Location
Profile data output path.
After profile data collection is complete, a file directory named after the project name is generated in the path.
- Access the Executable Properties configuration page, as shown in the following figures.Figure 3 Executable Properties
Table 2 Executable Properties parameters Parameter
Description
Project Path
Path of the target project for profiling. This parameter is mandatory.
If the specified target project is a training project, you can click Start to directly start the Profiling tool.
Executable File
Executable file of the target project for profiling. This parameter is mandatory.
Set this parameter to an executable file in the Project Path subdirectory, which can be a binary script file (such as the main file), Python script file (such as the train.py file), and Shell script file (such as the npu_set_env_1p.sh file).
Due to the restrictions of the msprof tool, the requirements for specifying a Python script file are as follows:
- Paths in the Python script of the pyACL project must be absolute paths.
- Asynchronous APIs (whose names end with async) cannot be called.
The shell script file is provided by the user and does not need to be saved in the Project Path.
Command Arguments
Application execution parameters. Configure this as required and separate arguments with spaces. By default, this parameter is left empty.
Environment Variables
Environment variable configuration. You can manually configure the environment variables or click
to configure them in the dialog box displayed. This parameter is optional.CANN Version
CANN package version. This parameter is mandatory.
It is specified during project creation in MindStudio IDE. If the version is not specified, click Change to specify the installation path of the CANN package.
- Click Next to obtain the profiling configuration. A dialog box is displayed, as shown in Figure 4.
- The Profiling Options page is displayed. You can configure Task-based or Sample-based in AI Core Profiling. See Figure 5 and Figure 6.
Table 3 Profiling Options parameters Parameter
Description
AI Core Profiling
Mode
- Task-based: AI Core profiling switch. It collects profile data task by task. The default value is Pipeline Utilization.
- Sample-based: AI Core profiling switch. It collects profile data at a fixed interval (AI Core-Sampling Interval). The default value is Pipeline Utilization.
Metrics
When Mode is set to Task-based:
- Pipeline Utilization: percentage of time taken by the compute units and MTEs
- Arithmetic Utilization: percentage of time taken by the cube and vector instructions
- UB/L1/L2/Main Memory Bandwidth: memory read/write bandwidth rate of UB/L1/L2/main memory
- L0A/L0B/L0C Memory Bandwidth: memory read/write bandwidth rate of L0A/L0B/L0C
- UB Memory Bandwidth: UB read/write bandwidth rate of MTE/Vector/Scalar
When Mode is set to Sample-based:
- Pipeline Utilization: percentage of time taken by the compute units and MTEs
- Arithmetic Utilization: percentage of time taken by the cube and vector instructions
- UB/L1/L2/Main Memory Bandwidth: memory read/write bandwidth rate of UB/L1/L2/main memory
- L0A/L0B/L0C Memory Bandwidth: memory read/write bandwidth rate of L0A/L0B/L0C
- UB Memory Bandwidth: UB read/write bandwidth rate of MTE/Vector/Scalar
L2Cache
L2 sampling switch in Task-based profiling. This parameter is optional and is disabled by default.
Frequency(Hz)
Sampling frequency (Hz) in Sample-based profiling. Defaults to 100. Must be in the range [1, 100].
MsprofTX
MsprofTX
Switch that controls the MsprofTX user and upper-layer framework program to output profile data. This parameter is optional and is disabled by default.
API Trace
AscendCL API
AscendCL profiling switch. It traces AscendCL API calls. This parameter is enabled by default.
Runtime API
Runtime profiling switch. It traces Runtime API calls. This parameter is optional and is disabled by default.
Graph Engine(GE)
Graph Engine profiling switch. It traces the scheduling information of Graph Engine. This switch is enabled by default and cannot be disabled.
AICPU Operators
AI CPU profiling switch, which is used to collect enhanced AI CPU profile data. This parameter is optional and is disabled by default.
HCCL
HCCL
HCCL profiling switch. This parameter is optional and is disabled by default.
After the profiling is complete, only data of the first iteration of the model (ID) with the largest number of iterations is exported by default.
Device System Profiling
CPU & Memory Usage Profiling
Profiling switch for system CPU usage and system memory. This parameter is optional and is disabled by default.
You can change the sampling frequency (Hz). The value must be in the range [1, 10], and is defaulted to 10 Hz.
Host System Profiling
Application Based System Profiling
CPU
Samples the host CPU usage. This parameter is optional and is disabled by default.
Memory
Samples the host memory usage. This parameter is optional and is disabled by default.
Disk
Samples the host disk usage. This parameter is optional and is disabled by default.
NOTE:The third-party open-source tool iotop must be installed for collecting data of disk calls. For details, see Before You Start.
Network
Samples the host network usage. This parameter is optional and is disabled by default.
Syscall & PThreadcall
Samples host-side syscall and pthreadcall. This parameter is optional and is disabled by default.
System CPU & Memory Usage
CPU
Samples the CPU usage of the host system and all processes. This parameter is optional and is disabled by default.
Memory
Samples the memory usage of the host system and all processes. This parameter is optional and is disabled by default.
Frequency(Hz)
CPU and memory usage sampling frequency (Hz). Defaults to 50. Must be in the range [1, 50].
Table 3 lists configuration options for full collection. The actual configuration options supported by a processor are subject to the GUI.
- After the preceding configurations are complete, click Start in the lower right corner of the window to start profile data collection.
The performance analysis results will be automatically displayed in the MindStudio IDE window after the execution is complete.



