Collection Operations
Profiling using the TensorFlow framework can be executed only in training and online inference scenarios. Data can be profiled globally or locally by calling the npu_bridge.profiler.profiler class. That is, profile data sampling can be enabled only by running commands in the profiler class, the following describes the global profiling mode.
Prerequisites
- Training scenario:
- TensorFlow 1.x: Prepare a model trained on TensorFlow 1.x and a matched dataset, and port the model to the Ascend AI Processor. For details, see "Manual Porting" or "Automated Porting" in the TensorFlow 1.15 Model Porting Guide.
- TensorFlow 2.x: Prepare a model trained on TensorFlow 2.x and a matched dataset, and port the model to the Ascend AI Processor. For details, see "Manual Porting" in the TensorFlow 2.6.5 Model Porting Guide.
- Online inference scenario: Download a pre-trained model and prepare the online inference script.
Restrictions
Online inference supports only profiling in sess.run mode. For details about the sess.run mode, see Collection of Raw Profile Data (TensorFlow 1.x Training/Online Inference).
Collection of Raw Profile Data (TensorFlow 1.x Training/Online Inference)
Add the following profiling configurations to the training script (for example, train_*.py) or online inference script and perform training or online inference.
For details about training/online inference operations in the TensorFlow framework, see the TensorFlow 1.15 Model Porting Guide.
- In Estimator mode, use profiling_config under NPURunConfig to enable profiling. The sample code is as follows:
1 2 3 4 5 6
from npu_bridge.estimator.npu.npu_config import NPURunConfig from npu_bridge.estimator.npu.npu_config import ProfilingConfig profiling_options = '{"output":"/tmp/profiling","training_trace":"on","task_trace":"on","fp_point":"","bp_point":"","aic_metrics":"PipeUtilization"}' profiling_config = ProfilingConfig(enable_profiling=True, profiling_options = profiling_options) session_config=tf.ConfigProto() config = NPURunConfig(profiling_config=profiling_config, session_config=session_config)
- In sess.run mode, use the session configuration options profiling_mode and profiling_options to enable profiling. The sample code is as follows:
1 2 3 4 5 6 7 8
custom_op = config.graph_options.rewrite_options.custom_optimizers.add() custom_op.name = "NpuOptimizer" custom_op.parameter_map["use_off_line"].b = True custom_op.parameter_map["profiling_mode"].b = True custom_op.parameter_map["profiling_options"].s = tf.compat.as_bytes('{"output":"/tmp/profiling","training_trace":"on","task_trace":"on","fp_point":"","bp_point":"","aic_metrics":"PipeUtilization"}') config.graph_options.rewrite_options.remapping = RewriterConfig.OFF # Disable remapping. with tf.Session(config=config) as sess: sess.run()
For details about profiling_options, see Profiling Options.
- If profiling_options are not set and enable_profiling is set to true in Estimator mode or profiling_mode is set to true in sess.run mode, training_trace, task_trace, hccl, aicpu, and aic_metrics (PipeUtilization) are executed by default to collect profile data and save the data to the directory where the current AI job is located. If enable_profiling is set to true in Estimator mode or profiling_mode is set to true in sess.run mode and any option of profiling_options is set, the default values of profiling_options are described in Profiling Options.
- When configuring fp_point and bp_point, you may not find any data no matter whether you have specified a specific operator or used the automatic search algorithm (fp_point and bp_point are left empty). As a result, values of FP_BP, Grad_refresh Bound, and Data_aug Bound are null in the parsed iteration trace data.
Sampling Raw Profile Data (TensorFlow 2.x Training/Online Inference)
Add the following profiling configurations to the training script (for example, train_*.py) or online inference script and perform training or online inference. For details about training/online inference in the TensorFlow 2.x framework, see the TensorFlow 2.6.5 Model Porting Guide.
1 2 3 4 5 6 7 8 9 10 11 12 | import npu_device ... # profiling npu_device.global_options().profiling_config.enable_profiling = True profiling_options = '{"output":"/home/profiling", \ "training_trace":"on", \ "task_trace":"on", \ "fp_point":"", \ "bp_point":""}' npu_device.global_options().profiling_config.profiling_options = profiling_options ... npu_device.open().as_default() |
For details about profiling_options, see Profiling Options.
Data Collection Description
After the PROFILING_OPTIONS parameter is set, parse the raw data, export the result files as visualized profile data files, and save these files in the PROF_XXX/mindstudio_profiler_output directory. For details, see Profile Data Parsing and Export (msprof Command).
The generated profile data is shown in Table 1.
Argument |
Profile Data File |
|---|---|
Automatically generated by default |
|
task_trace, task_time |
The CANN level in msprof_*.json and the api_statistic_*.csv file The Ascend Hardware level in msprof_*.json and the task_time_*.csv file The HCCL level in msprof_*.json and the hccl_statistic_*.csv file |
runtime_api |
The CANN_Runtime level in msprof_*.json and the api_statistic_*.csv file |
hccl |
The HCCL level in msprof_*.json and the hccl_statistic_*.csv file |
aicpu |
|
aic_metrics |
|
l2 |
|
msproftx |
|
sys_hardware_mem_freq |
On-chip memory read/write rate file The LLC level in msprof_*.json and the llc_read_write_*.csv file The NPU MEM level in msprof_*.json and the npu_mem_*.csv file |
llc_profiling |
- |
sys_io_sampling_freq |
|
sys_interconnection_freq |
|
dvpp_freq |
|
host_sys |
The CPU Usage level in msprof_*.json and the host_cpu_usage_*.csv file The Memory Usage level in msprof_*.json and the host_mem_usage_*.csv file |
host_sys_usage |
CPU usage of processes on the host |
host_sys_usage_freq |
- |