Profile Data Parsing and Export

For the following products, profile data cannot be directly parsed on devices. Instead, you need to copy the collected PROF_XXX directory to an environment where the Ascend-CANN-Toolkit is installed for data parsing and export.

Prerequisites

  • Ensure that operations in Before You Start have been completed.
  • The profile data has been collected. (In CANN 6.3.RC2, CANN 6.2.RC2, and later versions, the structure of the collected raw profile data has been optimized; therefore, only the raw profile data collected by these versions can be parsed and exported.)

Procedure

Run the data exporting command.

Example:

msprof --export=on --output=<dir> [--type=<type>] [--iteration-id=<number>] [--model-id=<number>] [--summary-format=<csv/json>] [--clear=on]
Table 1 Command-line options

Option

Description

Required/Optional

--export

Profile data parsing and export, either on or off (default).

To export data of a specific model (model ID) or iteration (iteration ID), run the msprof --export command again to configure the --model-id and --iteration-id options after the msprof profiling command is executed.

The PROF_XXX files that are not parsed are automatically parsed and then exported.

Example: msprof --export=on --output=/home/HwHiAiUser

Required

--output

Directory for storing the profile data file. The value must be the parent directory of the PROF_XXX or PROF_XXX directory, for example, /home/HwHiAiUser/profiler_data/PROF_XXX.

The following special characters are not allowed in the path: "\n", "\\n", "\f", "\\f", "\r", "\\r", "\b", "\\b", "\t", "\\t", "\v", "\\v", "\u007F", "\\u007F", "\"", "\\\"", "'", "\'", "\\", "\\\\", "%", "\\%", ">", "\\>", "<", "\\<", "|", "\\|", "&", "\\&", "$", "\\$", ";", "\\;", "`", "\\`".

Required

--type

Format of the profile data parsing result file. That is, you can choose the format of the result file generated after the profile data collected by the msprof command is automatically parsed. The available formats include:

  • text: parsed into timeline and summary files in .json and .csv formats. For details, see Profile Data File References. CANN 7.0.RC1, 7.0.0, and later versions support the use of this parameter to parse profile data.
  • db: parsed into a .db file (msprof_timestamp.db) that summarizes all profile data and is displayed by the MindStudio Insight tool. The amount of information in this data format is different from that parsed by the text parameter. You are advised to use the text parameter. Data parsing in .db format is currently not supported in MindSpore scenarios. When the db command is used, only the --output parameter is supported.
  • The Atlas 200/300/500 Inference Product does not support system data parsing.

The default value is text.

Optional

--iteration-id

Iteration ID. The value must be a positive integer. This option and --model-id must be configured at the same time.

  • If --model-id is set to other values, this option specifies the iteration ID for graph-based statistics collection. (The iteration ID increases by 1 each time a graph is executed. When a script is compiled into multiple graphs, the iteration ID is different from the step ID at the script layer.)

Optional

--model-id

Model ID. The value must be a positive integer. This option and --iteration-id must be configured at the same time.

Optional

--summary-format

Export format of a summary data file. The values are as follows:

  • json: The parsed summary data file is in JSON format.
  • csv (default): The parsed summary data file is in CSV format.

Supported only when --type=text.

Optional

--python-path

Path of the Python interpreter used for parsing. The Python version must be 3.7.5 or later.

Optional

--clear

Data simplification mode. After this option is enabled, the sqlite directory in PROF_XXX/device_{id} is deleted after profile data is exported, so as to save storage space. The value can be on or off (default).

Optional

Note 1: For the Atlas 200/300/500 Inference Product, if --iteration-id and --model-id are not configured, profile data of the model (model ID) with the largest number of iterations is exported by default. For other processors, all profile data is exported by default.

Note 2: In single-operator scenarios and scenarios where only Collecting Ascend AI Processor System Data is involved, --iteration-id and --model-id are not supported.

After the preceding command is executed, the mindstudio_profiler_output directory is generated in the PROF_XXX directory under --output.

The structure of the generated profile data directory is as follows:

When the --type=db command is executed, a .db file (msprof_timestamp.db) that summarizes all profile data is generated in the PROF_XXX directory, and the mindstudio_profiler_output directory is not generated.

  • Single-process collection
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    └── PROF_XXX
          ├── device_0
              └── data
          ├── device_1
              └── data
          ├── host
              └── data
          └── mindstudio_profiler_output
                ├── msprof_{timestamp}.json
                ├── step_trace_{timestamp}.json
                ├── xx_*.csv
                 ...
                └── README.txt
    
  • Multi-process collection
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    └── PROF_XXX1
          ├── device_0
              └── data
          ├── host
              └── data
          └── mindstudio_profiler_output
                ├── msprof_{timestamp}.json
                ├── step_trace_{timestamp}.json
                ├── xx_*.csv
                 ...
                └── README.txt
    └── PROF_XXX2
          ├── device_1
              └── data
          ├── host
              └── data
          └── mindstudio_profiler_output
                ├── msprof_{timestamp}.json
                ├── step_trace_{timestamp}.json
                ├── xx_*.csv
                 ...
                └── README.txt
    
  • The .json files in the mindstudio_profiler_output directory are timeline information files, which collect the running durations of operators and tasks and display the collected data in color blocks. The .csv files are summary information files, which summarize the running durations in tables. For details about profile data, see Profile Data File References.
  • In multi-device scenarios, if single-process collection is started, only one PROF_XXX directory is generated. If multi-process collection is started, multiple PROF_XXX directories are generated. In addition, the Device directory is generated in the PROF_XXX directory. The number of Device directories generated in each PROF_XXX directory is related to the actual user operations and does not affect profile data analysis.
  • The files in the mindstudio_profiler_output directory are generated based on the actual collected profile data. If the necessary profile data file is absent, the corresponding timeline and summary data will be unavailable.
  • For a msprof collection process that is forcibly interrupted, the tool saves the collected raw profile data. You can also run msprof --export to parse and export the data.