Using TensorFlow APIs for Data Profiling
Overview
For TensorFlow training or online inference, simply use TensorFlow APIs in your training script to activate profiling. The following features are available:
- Global profiling: Profile data of all behaviors executed by graphs, in large data volume.
You can either modify the training script and configure profiling_mode (as elaborated in this section), or set the environment variable PROFILING_MODE (see Profiling with Environment Variables). If both are used, profiling_mode takes precedence over PROFILING_MODE.
- Local profiling: Profile data of specified subgraphs or steps. Use the WITH statement to call the profiler class and put the operations for which data profiling needs to be enabled into the scope of the profiler class.
This section describes how to enable global profiling. For more information, see TensorFlow 1.15 Model Porting Guide and TensorFlow 2.6.5 Model Porting Guide.
Prerequisites
Before enabling profiling, ensure that the training or online inference script can be executed properly.
Procedure
- Configure the following information in the training script. The following uses the TensorFlow 1.15 manual porting script as an example.
- In Estimator mode, you can enable task_trace to profile task trace data. The sample is as follows:
1 2 3 4 5 6 7 8 9 10 11 12
from npu_bridge.estimator.npu.npu_config import NPURunConfig from npu_bridge.estimator.npu.npu_config import ProfilingConfig from npu_bridge.npu_init import * # enable_profiling: profiling enable. # output: path for storing profile data. Create the specified directory in the training environment (container or host) in advance. The running user configured during installation must have the read and write permissions on this path. It can be either an absolute path or a relative path. # task_trace: task trace collection enable. profiling_options = '{"output":"/home/HwHiAiUser/output","task_trace":"on"}' profiling_config = ProfilingConfig(enable_profiling=True, profiling_options= profiling_options) session_config=tf.ConfigProto() config = NPURunConfig(profiling_config=profiling_config, session_config=session_config)
If the problem cannot be spotted, enable training_trace to profile iteration traces. The sample is as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
from npu_bridge.estimator.npu.npu_config import NPURunConfig from npu_bridge.estimator.npu.npu_config import ProfilingConfig from npu_bridge.npu_init import * # enable_profiling: profiling enable # output: path for storing profile data # task_trace: task trace collection enable # training_trace: iteration trace collection enable # fp_point: start point of the forward propagated operator in iteration traces, recording the start timestamp of forward propagation. # bp_point: end point of the backward propagated operator in iteration traces, recording the end timestamp of backward propagation. fp_point and bp_point are used to compute the time used by forward and backward propagation. profiling_options = '{"output":"/home/HwHiAiUser/output","task_trace":"on","training_trace":"on","aicpu":"on","fp_point":"","bp_point":"","aic_metrics":"PipeUtilization"}' profiling_config = ProfilingConfig(enable_profiling=True, profiling_options= profiling_options) session_config=tf.ConfigProto(allow_soft_placement=True) config = NPURunConfig(profiling_config=profiling_config, session_config=session_config)
- In sess.run mode, you can enable task_trace to profile task trace data. The sample is as follows:
1 2 3 4 5 6 7 8 9 10
custom_op = config.graph_options.rewrite_options.custom_optimizers.add() custom_op.name = "NpuOptimizer" custom_op.parameter_map["use_off_line"].b = True custom_op.parameter_map["profiling_mode"].b = True custom_op.parameter_map["profiling_options"].s = tf.compat.as_bytes('{"output":"/home/HwHiAiUser/output","task_trace":"on"}') config.graph_options.rewrite_options.remapping = RewriterConfig.OFF config.graph_options.rewrite_options.memory_optimization = RewriterConfig.OFF with tf.Session(config=config) as sess: sess.run()
If the problem cannot be spotted, enable training_trace to profile iteration traces. The sample is as follows:
1 2 3 4 5 6 7 8 9 10
custom_op = config.graph_options.rewrite_options.custom_optimizers.add() custom_op.name = "NpuOptimizer" custom_op.parameter_map["use_off_line"].b = True custom_op.parameter_map["profiling_mode"].b = True custom_op.parameter_map["profiling_options"].s = tf.compat.as_bytes('{"output":"/home/HwHiAiUser/output","task_trace":"on","training_trace":"on","aicpu":"on","fp_point":"","bp_point":"","aic_metrics":"PipeUtilization"}') config.graph_options.rewrite_options.remapping = RewriterConfig.OFF config.graph_options.rewrite_options.memory_optimization = RewriterConfig.OFF with tf.Session(config=config) as sess: sess.run()
- For details about the profiling configuration, see Profiling Options.
- In sess.run mode when profiling_mode is set to true or in Estimator mode when enable_profiling is set to true, if profiling_options is not configured, data of training_trace, task_trace, hccl, aicpu, and aic_metrics (PipeUtilization) will be profiled and saved in the current AI task directory by default.
- When configuring fp_point and bp_point, you may not find any data no matter whether you have specified an operator or used the automatic search algorithm (fp_point and bp_point are left empty). As a result, values of FP_BP, Grad_refresh Bound, and Data_aug Bound are empty in the parsed iteration trace data.
- In Estimator mode, you can enable task_trace to profile task trace data. The sample is as follows:
- Re-execute the training script.
After the training is complete, the PROF_XXX folder is generated in the directory specified by the output parameter to store the raw profile data.
- Run the msprof command to parse the profile data. For details, see Offline Parsing.
msprof --export=on --output=/home/HwHiAiUser/profiling_output/PROF_XXX
After the parsing is complete, you can find the mindstudio_profiler_output directory generated in the PROF_XXX folder.
Once you enable the profiling parameters, they create specific result files. For details, see Profiling Results.