Dynamically Collecting Profile Data

During dynamic profile data collection, you can start or stop the collection process at any time.

Instructions

Profile data can be dynamically collected in launch or attach mode.

  • Launch mode: When the msprof command line starts dynamic collection, it synchronously invokes the AI tasks and enters the interactive mode for profile data collection. You can run commands to start or stop collection at any time.
  • Attach mode: You start the AI task first, and then start msprof dynamic collection and enter the interactive mode for profile data collection. You can run commands to start or stop collection at any time.

If a user process is set to run in the background, the launch mode becomes invalid and the interaction interface cannot be accessed. In this case, you are advised to use the attach mode to dynamically collect profile data.

Availability

Atlas Training Series Product

Prerequisites

  • Before using this mode to collect profile data, ensure that operations in Before You Start have been completed.
  • Before dynamically collecting profile data, ensure that your AI task can run properly in the operating environment.
  • In attach mode, you need to set the required environment variable before starting a training job.
    export PROFILING_MODE=dynamic

Restrictions

  • In the same AI task, only one user is allowed to enter the interactive mode at a time.
  • If the user does not start data collection within 30 minutes after entering the interactive mode, the system automatically exits the interactive mode. The user can run the corresponding command to enter the interactive mode again.
  • In multi-device (including cluster) scenarios, you are advised to use the attach mode to dynamically collect profile data. Profile data of each device can only be collected independently.
  • This function cannot be configured together with --delay and --duration.
  • In launch mode, the environment variables PROFILING_MODE and PROFILING_OPTIONS cannot be set in the passed user application.

Command Example

Use either of the following modes:

  • Launch mode:
    msprof --dynamic=on --output=/home/projects/output --model-execution=on --runtime-api=on --aicpu=on /home/projects/MyApp/out/main
    > start
        
    ...
    > stop
       
    ...
    > quit
       
    ...

    The dynamical collection of profile data requires the passing of a user application.

  • Attach mode:
    msprof --dynamic=on --pid=<pid> --output=/home/projects/output --model-execution=on --runtime-api=on --aicpu=on
    > start
        
    ...
    > stop
       
    ...
    > quit
       
    ...

You can select required collection items on the command line for dynamically collecting profile data by adding the collection options listed in Profile Data Collection to the command line.

The collected profile data is saved in the directory specified by the --output option. The data result depends on the collection options specified in the command line.

Options

Table 1 Options

Option

Description

Required/Optional

--dynamic

Switch that controls dynamic profile data collection, either on or off (default).

Required

--pid

PID of an application to profile. For details about how to obtain the PID, see Application PID Obtaining.

Required in attach mode

start

Starts collection.

Optional

stop

Stops collection. Each time the start and stop commands are executed, a PROF_XXX directory for storing the data file is generated in the path specified by the --output option.

Optional

quit

Stops collection and exits the interactive mode. The AI task is running properly. You can run the msprof command to enter the interactive mode again.

Optional

  • The maximum number of executions for the start and stop commands is 100. If the total execution count of both commands exceeds 100, the server will terminate the connection, meaning that a maximum of 50 profile samples will be collected. When the server is reconnected, the count will be reset.
  • When the start and stop commands are repeatedly executed, the stop command may end the profiling process, which can terminate the data reporting by CANN. As a result, an ERROR log is printed, which is a normal occurrence.

Application PID Obtaining

  • In single-device scenarios:

    Go to the home directory as the running user and run the following command to query the PID of the application:

    ll ~/dynamic_profiling_socket_*

    Information similar to the following is displayed (the root user is used as the running user). Find the PID of the latest running application. In the following information, 130065 is the PID of the application.

    1
    2
    3
    srw-------. 1 root root 0 Feb  8 11:24 /root/dynamic_profiling_socket_112549
    srw-------. 1 root root 0 Feb  8 13:38 /root/dynamic_profiling_socket_128848
    srw-------. 1 root root 0 Feb  8 13:39 /root/dynamic_profiling_socket_130065
    
  • In multi-region scenarios:

    You are advised to perform the AI tasks on every device one by one and obtain the PID based on the method in single-device scenarios. If an AI task is performed on multiple devices at the same time, obtain any PID for data collection. Currently, you cannot specify multiple PIDs for collection at the same time.

Data Parsing

You are advised to perform operations in Profile Data Parsing and Export (msprof Command) to parse and export profile data in the PROF_XXX directory.