Collection Operations

Profiling using the TensorFlow framework can be executed only in training and online inference scenarios. Data can be profiled globally or locally by calling the npu_bridge.profiler.profiler class. That is, profile data sampling can be enabled only by running commands in the profiler class, the following describes the global profiling mode.

Prerequisites

  • Training scenario:
    • TensorFlow 1.x: Prepare a model trained on TensorFlow 1.x and a matched dataset, and port the model to the Ascend AI Processor. For details, see "Manual Porting" or "Automated Porting" in the TensorFlow 1.15 Model Porting Guide.
    • TensorFlow 2.x: Prepare a model trained on TensorFlow 2.x and a matched dataset, and port the model to the Ascend AI Processor. For details, see "Manual Porting" in the TensorFlow 2.6.5 Model Porting Guide.
  • Online inference scenario: Download a pre-trained model and prepare the online inference script.

Restrictions

Online inference supports only profiling in sess.run mode. For details about the sess.run mode, see Collection of Raw Profile Data (TensorFlow 1.x Training/Online Inference).

Collection of Raw Profile Data (TensorFlow 1.x Training/Online Inference)

Add the following profiling configurations to the training script (for example, train_*.py) or online inference script and perform training or online inference.

For details about training/online inference operations in the TensorFlow framework, see the TensorFlow 1.15 Model Porting Guide.

The two modes for the TensorFlow framework are described as follows:
  • In Estimator mode, use profiling_config under NPURunConfig to enable profiling. The sample code is as follows:
    1
    2
    3
    4
    5
    6
    from npu_bridge.estimator.npu.npu_config import NPURunConfig
    from npu_bridge.estimator.npu.npu_config import ProfilingConfig
    profiling_options = '{"output":"/tmp/profiling","training_trace":"on","task_trace":"on","fp_point":"","bp_point":"","aic_metrics":"PipeUtilization"}'
    profiling_config = ProfilingConfig(enable_profiling=True, profiling_options = profiling_options)
    session_config=tf.ConfigProto()
    config = NPURunConfig(profiling_config=profiling_config, session_config=session_config)
    
  • In sess.run mode, use the session configuration options profiling_mode and profiling_options to enable profiling. The sample code is as follows:
    1
    2
    3
    4
    5
    6
    7
    8
    custom_op = config.graph_options.rewrite_options.custom_optimizers.add()
    custom_op.name = "NpuOptimizer"
    custom_op.parameter_map["use_off_line"].b = True
    custom_op.parameter_map["profiling_mode"].b = True
    custom_op.parameter_map["profiling_options"].s = tf.compat.as_bytes('{"output":"/tmp/profiling","training_trace":"on","task_trace":"on","fp_point":"","bp_point":"","aic_metrics":"PipeUtilization"}') 
    config.graph_options.rewrite_options.remapping = RewriterConfig.OFF  # Disable remapping.
    with tf.Session(config=config) as sess:
    	sess.run()
    

For details about profiling_options, see Profiling Options.

  • If profiling_options are not set and enable_profiling is set to true in Estimator mode or profiling_mode is set to true in sess.run mode, training_trace, task_trace, hccl, aicpu, and aic_metrics (PipeUtilization) are executed by default to collect profile data and save the data to the directory where the current AI job is located. If enable_profiling is set to true in Estimator mode or profiling_mode is set to true in sess.run mode and any option of profiling_options is set, the default values of profiling_options are described in Profiling Options.
  • When configuring fp_point and bp_point, you may not find any data no matter whether you have specified a specific operator or used the automatic search algorithm (fp_point and bp_point are left empty). As a result, values of FP_BP, Grad_refresh Bound, and Data_aug Bound are null in the parsed iteration trace data.

Sampling Raw Profile Data (TensorFlow 2.x Training/Online Inference)

Add the following profiling configurations to the training script (for example, train_*.py) or online inference script and perform training or online inference. For details about training/online inference in the TensorFlow 2.x framework, see the TensorFlow 2.6.5 Model Porting Guide.

A code example is as follows:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
import npu_device
...
# profiling
npu_device.global_options().profiling_config.enable_profiling = True
profiling_options = '{"output":"/home/profiling", \
			"training_trace":"on", \
			"task_trace":"on", \
			"fp_point":"", \
			"bp_point":""}'
npu_device.global_options().profiling_config.profiling_options = profiling_options
...
npu_device.open().as_default()

For details about profiling_options, see Profiling Options.

Data Collection Description

After the PROFILING_OPTIONS parameter is set, parse the raw data, export the result files as visualized profile data files, and save these files in the PROF_XXX/mindstudio_profiler_output directory. For details, see Profile Data Parsing and Export (msprof Command).

The generated profile data is shown in Table 1.

Table 1 Introduction to profile data files

Argument

Profile Data File

Automatically generated by default

msprof (Timeline Report)

op_summary_*.csv

op_statistic_*.csv

fusion_op_*.csv

step_trace (iteration trace data)

task_trace, task_time

The CANN level in msprof_*.json and the api_statistic_*.csv file

The Ascend Hardware level in msprof_*.json and the task_time_*.csv file

The HCCL level in msprof_*.json and the hccl_statistic_*.csv file

step_trace_*.json

runtime_api

The CANN_Runtime level in msprof_*.json and the api_statistic_*.csv file

hccl

The HCCL level in msprof_*.json and the hccl_statistic_*.csv file

api_statistic_*.csv

aicpu

aicpu_*.csv

dp_*.csv

aic_metrics

op_summary_*.csv

l2

l2_cache_*.csv

msproftx

msproftx data

sys_hardware_mem_freq

On-chip memory read/write rate file

The LLC level in msprof_*.json and the llc_read_write_*.csv file

The NPU MEM level in msprof_*.json and the npu_mem_*.csv file

npu_module_mem_*.csv

llc_profiling

-

sys_io_sampling_freq

The NIC level in msprof_*.json and the nic_*.csv file

The RoCE level in msprof_*.json and the roce_*.csv file

sys_interconnection_freq

The PCIe level in msprof_*.json and the pcie_*.csv file

The HCCS level in msprof_*.json and the hccs_*.csv file

dvpp_freq

dvpp_*.csv

host_sys

The CPU Usage level in msprof_*.json and the host_cpu_usage_*.csv file

The Memory Usage level in msprof_*.json and the host_mem_usage_*.csv file

host_sys_usage

System CPU usage on the host

CPU usage of processes on the host

System memory usage on the host

Memory usage of processes on the host

host_sys_usage_freq

-