Profile Data Collection

Prerequisites

  • Currently, MindStudio IDE does not support data collection in cluster scenarios. You can use the Import Result function to import the parent directory of PROF_XXX to display the collected cluster profile data.

    For details about profile data collection in cluster scenarios, see "Appendixes" > "Performance Analysis in Cluster Training Scenarios" in the Profiling Instructions.

  • To collect data using the training project, add the configuration information of the PROFILING_OPTIONS field to the environment variable script file env_*.sh of the training project. The following is an example:
    export PROFILING_MODE=true
    export PROFILING_OPTIONS='{"output":"/tmp/profiling","training_trace":"on","task_trace":"on","fp_point":"","bp_point":"","aic_metrics":"MemoryL0"}'

    The path specified by output stores the profile data collected on the server by Profiling, which will be copied to the path specified by Project Location and a .json result file is generated for MindStudio IDE to display.

    The PROFILING_OPTIONS field is used to configure profiling items. Select required options as required. For details about the options for adding profiling configurations to the training project script, see "Other Collection Modes" > "Collecting Data Using TensorFlow Framework Interfaces" > "Profiling Options" in Profiling Instructions.

Procedure

  1. In the navigation bar on the left of the welcome page, click Projects, and select and open a built project.
  2. Choose Ascend > System Profiler from the menu bar. The system analysis project page is displayed.
    Figure 1 System analysis project page
  3. On the system analysis project page, click New Project on the welcome page or the icon in the upper left corner. The profiling configuration window is displayed, as shown in Figure 2.
    Set Project Name and Project Location under Project Properties. Click Next.
    Table 1 Project Properties parameters

    Parameter

    Description

    Project Name

    Profiling project name customized by the user.

    After the configuration, a folder named after the project name is automatically created in the directory specified by Project Location. The collected raw profile data directory PROF_XXX and the data parsing result .json file are stored in this folder.

    NOTE:

    The parsing result .json file is named in the following format: report_{timestamp}_{device_id}_{model_id}_{iter_id}.json, in which {device_id} indicates the device ID, {model_id} indicates the model ID, and {iter_id} indicates the ID of an iteration.

    Project Location

    Profile data output path.

    After profile data collection is complete, a file directory named after the project name is generated in the path.

    Figure 2 Configuring project properties
  4. Access the Executable Properties configuration page, as shown in the following figures.
    Figure 3 Executable Properties
    Table 2 Executable Properties parameters

    Parameter

    Description

    Project Path

    Path of the target project for profiling. This parameter is mandatory.

    If the specified target project is a training project, you can click Start to directly start the Profiling tool.

    Executable File

    Executable file of the target project for profiling. This parameter is mandatory.

    Set this parameter to an executable file in the Project Path subdirectory, which can be a binary script file (such as the main file), Python script file (such as the train.py file), and Shell script file (such as the npu_set_env_1p.sh file).

    Due to the restrictions of the msprof tool, the requirements for specifying a Python script file are as follows:

    • Paths in the Python script of the pyACL project must be absolute paths.
    • Asynchronous APIs (whose names end with async) cannot be called.

    The shell script file is provided by the user and does not need to be saved in the Project Path.

    Command Arguments

    Application execution parameters. Configure this as required and separate arguments with spaces. By default, this parameter is left empty.

    Environment Variables

    Environment variable configuration. You can manually configure the environment variables or click to configure them in the dialog box displayed. This parameter is optional.

    CANN Version

    CANN package version. This parameter is mandatory.

    It is specified during project creation in MindStudio IDE. If the version is not specified, click Change to specify the installation path of the CANN package.

  5. Click Next to obtain the profiling configuration. A dialog box is displayed, as shown in Figure 4.
    Figure 4 Obtaining profiling configuration
  6. The Profiling Options page is displayed. You can configure Task-based or Sample-based in AI Core Profiling. See Figure 5 and Figure 6.
    Figure 5 Task-based scenario
    Figure 6 Sample-based scenario
    Table 3 Profiling Options parameters

    Parameter

    Description

    AI Core Profiling

    Mode

    • Task-based: AI Core profiling switch. It collects profile data task by task. The default value is Pipeline Utilization.
    • Sample-based: AI Core profiling switch. It collects profile data at a fixed interval (AI Core-Sampling Interval). The default value is Pipeline Utilization.

    Metrics

    When Mode is set to Task-based:

    • Pipeline Utilization: percentage of time taken by the compute units and MTEs
    • Arithmetic Utilization: percentage of time taken by the cube and vector instructions
    • UB/L1/L2/Main Memory Bandwidth: memory read/write bandwidth rate of UB/L1/L2/main memory
    • L0A/L0B/L0C Memory Bandwidth: memory read/write bandwidth rate of L0A/L0B/L0C
    • UB Memory Bandwidth: UB read/write bandwidth rate of MTE/Vector/Scalar

    When Mode is set to Sample-based:

    • Pipeline Utilization: percentage of time taken by the compute units and MTEs
    • Arithmetic Utilization: percentage of time taken by the cube and vector instructions
    • UB/L1/L2/Main Memory Bandwidth: memory read/write bandwidth rate of UB/L1/L2/main memory
    • L0A/L0B/L0C Memory Bandwidth: memory read/write bandwidth rate of L0A/L0B/L0C
    • UB Memory Bandwidth: UB read/write bandwidth rate of MTE/Vector/Scalar

    L2Cache

    L2 sampling switch in Task-based profiling. This parameter is optional and is disabled by default.

    Frequency(Hz)

    Sampling frequency (Hz) in Sample-based profiling. Defaults to 100. Must be in the range [1, 100].

    MsprofTX

    MsprofTX

    Switch that controls the MsprofTX user and upper-layer framework program to output profile data. This parameter is optional and is disabled by default.

    API Trace

    AscendCL API

    AscendCL profiling switch. It traces AscendCL API calls. This parameter is enabled by default.

    Runtime API

    Runtime profiling switch. It traces Runtime API calls. This parameter is optional and is disabled by default.

    Graph Engine(GE)

    Graph Engine profiling switch. It traces the scheduling information of Graph Engine. This switch is enabled by default and cannot be disabled.

    AICPU Operators

    AI CPU profiling switch, which is used to collect enhanced AI CPU profile data. This parameter is optional and is disabled by default.

    HCCL

    HCCL

    HCCL profiling switch. This parameter is optional and is disabled by default.

    After the profiling is complete, only data of the first iteration of the model (ID) with the largest number of iterations is exported by default.

    Device System Profiling

    CPU & Memory Usage Profiling

    Profiling switch for system CPU usage and system memory. This parameter is optional and is disabled by default.

    You can change the sampling frequency (Hz). The value must be in the range [1, 10], and is defaulted to 10 Hz.

    Host System Profiling

    Application Based System Profiling

    CPU

    Samples the host CPU usage. This parameter is optional and is disabled by default.

    Memory

    Samples the host memory usage. This parameter is optional and is disabled by default.

    Disk

    Samples the host disk usage. This parameter is optional and is disabled by default.

    NOTE:

    The third-party open-source tool iotop must be installed for collecting data of disk calls. For details, see Before You Start.

    Network

    Samples the host network usage. This parameter is optional and is disabled by default.

    Syscall & PThreadcall

    Samples host-side syscall and pthreadcall. This parameter is optional and is disabled by default.

    System CPU & Memory Usage

    CPU

    Samples the CPU usage of the host system and all processes. This parameter is optional and is disabled by default.

    Memory

    Samples the memory usage of the host system and all processes. This parameter is optional and is disabled by default.

    Frequency(Hz)

    CPU and memory usage sampling frequency (Hz). Defaults to 50. Must be in the range [1, 50].

    Table 3 lists configuration options for full collection. The actual configuration options supported by a processor are subject to the GUI.

  7. After the preceding configurations are complete, click Start in the lower right corner of the window to start profile data collection.

    The performance analysis results will be automatically displayed in the MindStudio IDE window after the execution is complete.