Getting Started with Performance Analysis in Offline Inference Scenarios

In offline inference scenarios, you are advised to use the msprof command line tool to collect and parse profile data and analyze performance bottlenecks based on the generated result files.

Prerequisites

  • You have installed the CANN Toolkit package and ops operator package.

    For details, see CANN Software Installation Guide.

  • Application functions have been debugged and executable binary files or scripts are ready.

The following describes the Ascend EP and Ascend RC operations based on the PCIe working mode of the Ascend AI Processor. For details about Ascend EP and Ascend RC, see Ascend Product Modes.

Collecting, Parsing, and Exporting Profile Data (Ascend EP)

  1. Log in to the operating environment that has the CANN Toolkit package and ops operator package installed and run the following command to collect, parse, and export profile data:
    msprof --output={path} {User application}

    Command example:

    msprof --output=/home/HwHiAiUser/profiling_output /home/HwHiAiUser/HIAI_PROJECTS/MyAppname/out/main
    Table 1 Options

    Option

    Description

    Required/Optional

    --output

    Directory for storing the collected profile data. Defaults to the AI task file directory.

    The following special characters are not allowed in the path: "\n", "\\n", "\f", "\\f", "\r", "\\r", "\b", "\\b", "\t", "\\t", "\v", "\\v", "\u007F", "\\u007F", "\"", "\\\"", "'", "\'", "\\", "\\\\", "%", "\\%", ">", "\\>", "<", "\\<", "|", "\\|", "&", "\\&", "$", "\\$", ";", "\\;", "`", "\\`".

    Optional

    Passing the user application

    <app> [app arguments]

    Add the AI task execution command to the end of the msprof command as required to pass the user application or execution script.

    Format: msprof [msprof arguments] <app> [app arguments]

    • Example 1 (passing the Python execution script and script parameters using msprof): msprof --output=/home/projects/output python3 /home/projects/MyApp/out/sample_run.py parameter1 parameter2
    • Example 2 (passing the main binary executable application using msprof): msprof --output=/home/projects/output main
    • Example 3 (passing the main binary executable application using msprof): msprof --output=/home/projects/output /home/projects/MyApp/out/main
    • Example 4 (passing the main binary executable application and application parameters using msprof): msprof --output=/home/projects/output /home/projects/MyApp/out/main parameter1 parameter2
    • Example 5 (passing the shell execution script and script parameters using msprof): msprof --output=/home/projects/output /home/projects/MyApp/out/sample_run.sh parameter1 parameter2

    Required

    The preceding commands are the most basic profiling commands. For other profiling requirements, see Profile Data Collecting and Parsing.

    After the command is executed, find the PROF_XXX directory generated in the directory specified by --output. This directory stores the automatically parsed profile data. (The following shows only the profile data.)

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    ├── host   // Save the original data. You can ignore this step.
    ...
        └── data
    ├── device_{id}   // Save the original data. You can ignore this step.
    ...
        └── data
    ...
    ├── msprof_*.db
    ├── mindstudio_profiler_output
          ├── msprof_{timestamp}.json
          ├── step_trace_{timestamp}.json
          ├── xx_*.csv
           ...
          └── README.txt
    
  2. Access the mindstudio_profiler_output directory to view corresponding profile data files.

    For details about the files collected by default, see Table 2.

    Table 2 Profile data files collected by msprof by default

    File Name

    Description

    msprof_*.db

    .db file that aggregates all profile data.

    This file is exported by default only for the Atlas A3 Training Series Product and Atlas A2 Training Series Product/Atlas 800I A2 Inference Product.

    msprof_*.json

    Timeline report.

    step_trace_*.json

    Iteration trace data, which records the time required for each iteration. This profile data file does not exist in single-operator scenarios.

    op_summary_*.csv

    AI Core and AI CPU operator data.

    op_statistic_*.csv

    Number of times that the AI Core and AI CPU operators are called and the time consumption.

    step_trace_*.csv

    Iteration trace data. This profile data file does not exist in single-operator scenarios.

    task_time_*.csv

    Task Scheduler data.

    fusion_op_*.csv

    Operator fusion summary in a model. This profile data file does not exist in single-operator scenarios.

    api_statistic_*.csv

    Time spent by API execution at the CANN layer.

    Note: The asterisk (*) indicates the timestamp.

    • You are advised to use MindStudio Insight to analyze the .db file. For details, see MindStudio Insight User Guide.
    • To open a timeline .json file, enter chrome://tracing in the address box of Google Chrome, drag the file to the blank space to open it, and press the shortcut keys (w: zoom in; s: zoom out; a: move left; d: move right) on the keyboard to view it. You can view the running timeline information of the current AI task in the file, such as the API call timeline during task running, as shown in Figure 1.
      Figure 1 Viewing a .json file
    • You can directly open a summary .csv file to view it. You can view the software and hardware data of the AI task running in the .csv file, such as the time required by each operator to run on the AI processor software and hardware. You can quickly find the required information by sorting fields, as shown in Figure 2.
      Figure 2 Viewing a .csv file

Collecting, Parsing, and Exporting Profile Data (Ascend RC)

  1. Log in to the operating environment and go to the /var directory where the msprof tool is located.
  2. Run the following command to collect profile data:
    ./msprof --output={path} {User application}

    Command example:

    ./msprof --output=/home/HwHiAiUser/profiling_output /home/HwHiAiUser/HIAI_PROJECTS/MyAppname/out/main
    Table 3 Options

    Option

    Description

    Required/Optional

    --output

    Directory for storing the collected profile data. Defaults to the AI task file directory.

    The following special characters are not allowed in the path: "\n", "\\n", "\f", "\\f", "\r", "\\r", "\b", "\\b", "\t", "\\t", "\v", "\\v", "\u007F", "\\u007F", "\"", "\\\"", "'", "\'", "\\", "\\\\", "%", "\\%", ">", "\\>", "<", "\\<", "|", "\\|", "&", "\\&", "$", "\\$", ";", "\\;", "`", "\\`".

    Optional

    Passing the user application

    <app> [app arguments]

    Add the AI task execution command to the end of the msprof command as required to pass the user application or execution script.

    Format: msprof [msprof arguments] <app> [app arguments]

    • Example 1 (passing the Python execution script and script parameters using msprof): msprof --output=/home/projects/output python3 /home/projects/MyApp/out/sample_run.py parameter1 parameter2
    • Example 2 (passing the main binary executable application using msprof): msprof --output=/home/projects/output main
    • Example 3 (passing the main binary executable application using msprof): msprof --output=/home/projects/output /home/projects/MyApp/out/main
    • Example 4 (passing the main binary executable application and application parameters using msprof): msprof --output=/home/projects/output /home/projects/MyApp/out/main parameter1 parameter2
    • Example 5 (passing the shell execution script and script parameters using msprof): msprof --output=/home/projects/output /home/projects/MyApp/out/sample_run.sh parameter1 parameter2

    Required

    The preceding commands are the most basic profiling commands. For other profiling requirements, see Profile Data Collecting and Parsing.

    After the command is executed, the PROF_XXX directory is generated in the directory specified by --output. The directory structure is as follows:

    1
    2
    3
    4
    5
    6
    ├── device_{id}
        ├── data
        └── ...
    └── host
        ├── data
        └── ...
    
  3. Upload the PROF_XXX directory to the development environment where the CANN Toolkit package and ops operator package are installed, and run the following command to parse data:
    msprof --export=on --output=<dir>

    Option

    Description

    Required/Optional

    --export

    Profile data parsing and export. The value can be on or off (default).

    To export data of a specific model (model ID) or iteration (iteration ID), run the msprof --export command again to configure the --model-id and --iteration-id options after the msprof profiling command is executed.

    The PROF_XXX files that are not parsed are automatically parsed and then exported.

    Example: msprof --export=on --output=/home/HwHiAiUser

    Required

    --output

    Directory for storing the profile data file. The value must be the parent directory of the PROF_XXX or PROF_XXX directory, for example, /home/HwHiAiUser/profiler_data/PROF_XXX.

    The following special characters are not allowed in the path: "\n", "\\n", "\f", "\\f", "\r", "\\r", "\b", "\\b", "\t", "\\t", "\v", "\\v", "\u007F", "\\u007F", "\"", "\\\"", "'", "\'", "\\", "\\\\", "%", "\\%", ">", "\\>", "<", "\\<", "|", "\\|", "&", "\\&", "$", "\\$", ";", "\\;", "`", "\\`".

    Required

    For details about more parsing commands, see Offline Parsing.

    Data files are added to the PROF_XXX directory. The directory structure is as follows:
    1
    2
    3
    4
    5
    6
    7
    8
    9
    ├── device_{id}
        ├── data
    ├── host
        ├── data
        └── ...
    └── mindstudio_profiler_output
          ├── xx_*.csv
          ├── xx_*.json
    ...
    
  1. Access the mindstudio_profiler_output directory to view corresponding profile data files.

    For details about the files collected by default, see Table 4.

    Table 4 Profile data files collected by msprof by default

    File Name

    Description

    msprof_*.json

    Timeline report.

    step_trace_*.json

    Iteration trace data, which records the time required for each iteration. This profile data file does not exist in single-operator scenarios.

    op_summary_*.csv

    AI Core and AI CPU operator data.

    op_statistic_*.csv

    AI Core and AI CPU operator calling times and time consumption.

    step_trace_*.csv

    Iteration trace data. This profile data file does not exist in single-operator scenarios.

    task_time_*.csv

    Task Scheduler data.

    fusion_op_*.csv

    Operator fusion summary in a model. This profile data file does not exist in single-operator scenarios.

    api_statistic_*.csv

    Time spent by API execution at the CANN layer.

    Note: The asterisk (*) indicates the timestamp.

    • To open a timeline .json file, enter chrome://tracing in the address box of Google Chrome, drag the file to the blank space to open it, and press the shortcut keys (w: zoom in; s: zoom out; a: move left; d: move right) on the keyboard to view it. You can view the running timeline information of the current AI task in the file, such as the API call timeline during task running, as shown in Figure 3.
      Figure 3 Viewing a .json file
    • You can directly open a summary .csv file to view it. You can view the software and hardware data of the AI task running in the .csv file, such as the time required by each operator to run on the AI processor software and hardware. You can quickly find the required information by sorting fields, as shown in Figure 4.
      Figure 4 Viewing a .csv file

Performance Analysis

The preceding information shows that there are many profile data files and the analysis methods are flexible. The following introduces several important files and corresponding analysis methods.

  • View the msprof_*.json file to check the running timeline information of an AI task from a holistic perspective and analyze possible bottlenecks.
    Figure 5 Example of the msprof_*.json file
    • Area 1: data at the CANN layer, including the time consumption data of components (such as Runtime) and nodes (operators).
    • Area 2: bottom-layer NPU data, including the time consumption data and iteration trace data of each task stream under Ascend Hardware and other Ascend AI Processor system data.
    • Area 3: details about each operator and API in a timeline (displayed when you click a timeline color block).

    From the above figure, we can roughly analyze the APIs, operators, and task streams that take a long time. Then, find the corresponding delivery relationship based on the arrow directions, analyze the specific bottom-layer tasks that take a lot of time during the inference process, check the time-consuming APIs and operators in area 3, and perform quantitative analysis based on the .csv file to locate the performance bottlenecks.

  • Analyze the op_statistic_*.csv file to obtain the total calling duration and total number of calls of each operator type, check whether there is any type of operators that consume long execution time, and analyze whether these operators can be optimized.
    Figure 6 Example of the op_statistic_*.csv file

    You can sort the operators by Total Time to find out which type of operators takes a long time.

  • Analyze the op_summary_*.csv file to obtain the basic information and time consumption of a specific operator, find the operator with high time consumption, and check whether there is any optimization space for the operator.
    Figure 7 Example of the op_summary_*.csv file

    The Task Duration field specifies the operator time consumption. You can sort operators by Task Duration to find time-consuming operators, or sort them by Task Type to view the time-consuming operators running on different cores (such as AI Cores and AI CPUs).