In sess.run Mode

Automated porting

  1. Check whether init_resource exists in the ported script.
    • If it exists, modify it by referring to the following example. After the modification is complete, go to the next step.
       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      15
      16
      if __name__ == '__main__':
      
        session_config = tf.ConfigProto(allow_soft_placement=True)
        custom_op = session_config.graph_options.rewrite_options.custom_optimizers.add()
        custom_op.name = "NpuOptimizer"
        # Enable profiling.
        custom_op.parameter_map["profiling_mode"].b = True
        # Collect only task trace data.
        custom_op.parameter_map["profiling_options"].s = tf.compat.as_bytes('{"output":"/home/HwHiAiUser/output","task_trace":"on"}')
        # Collect task trace data and iteration trace data. You can collect only the task trace data. If the problem cannot be analyzed, collect the iteration trace data.
        # custom_op.parameter_map["profiling_options"].s = tf.compat.as_bytes('{"output":"/home/HwHiAiUser/output","task_trace":"on","training_trace":"on","aicpu":"on","fp_point":"","bp_point":"","aic_metrics":"PipeUtilization"}')
      
        (npu_sess, npu_shutdown) = init_resource(config=session_config)
        tf.app.run()
        shutdown_resource(npu_sess, npu_shutdown)
        close_session(npu_sess)
      

      Note that only the parameters supported in initialize_system can be configured in config of the init_resource function. For other functions, configure them in config_proto of the npu_config_proto function.

      • profiling_mode: profiling enable.
      • output: path for storing profile data. Create the specified directory in the training environment (container or host) in advance. The running user configured during installation must have the read and write permissions on this path. It can be either an absolute path or a relative path.
      • task_trace: task trace collection enable.
      • training_trace: iteration trace collection enable. If it is set to on, both fp_point and bp_point need to be configured.
      • aicpu: whether to collect details about the AI CPU operator, such as the operator execution time and data copy time.
      • fp_point: start point of the forward propagated operator in iteration traces. This parameter is used to record the start timestamp of forward propagation. You can leave it empty to make the system obtain the values or manually obtain them by referring to How Do I Determine fp_point and bp_point?.
      • bp_point: end point of the backward propagated operator in iteration traces. This parameter is used to record the end timestamp of backward propagation. You can leave it empty to make the system obtain the values or manually obtain them by referring to How Do I Determine fp_point and bp_point?.
      • aic_metrics: AI Core hardware information. The value PipeUtilization indicates the percentages of time taken by compute units and MTEs.
      • For details about profiling configuration, see Profiling.
    • If it does not exist, go to the next step.
  2. Search for the npu_config_proto function in the ported script, find the run configuration parameter (such as session_config in the following example), and configure related parameters in the run configuration to enable task_trace data collection.
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    session_config = tf.ConfigProto(allow_soft_placement=True)
    custom_op = session_config.graph_options.rewrite_options.custom_optimizers.add()
    custom_op.name = 'NpuOptimizer'
    # Enable profiling.
    custom_op.parameter_map["profiling_mode"].b = True
    # Collect only task trace data.
    custom_op.parameter_map["profiling_options"].s = tf.compat.as_bytes('{"output":"/home/HwHiAiUser/output","task_trace":"on"}')
    # Collect task trace data and iteration trace data. You can collect only the task trace data first. If the problem cannot be analyzed, collect the iteration trace data.
    # custom_op.parameter_map["profiling_options"].s = tf.compat.as_bytes('{"output":"/home/HwHiAiUser/output","task_trace":"on","training_trace":"on","aicpu":"on","fp_point":"","bp_point":"","aic_metrics":"PipeUtilization"}')
    config = npu_config_proto(config_proto=session_config)
    with tf.Session(config=config) as sess:
        sess.run(tf.global_variables_initializer())
        interaction_table.init.run()
    

Manual porting

You can try to collect task trace data by enabling task_trace.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
custom_op =  config.graph_options.rewrite_options.custom_optimizers.add()
custom_op.name =  "NpuOptimizer"
custom_op.parameter_map["use_off_line"].b = True
custom_op.parameter_map["profiling_mode"].b = True
custom_op.parameter_map["profiling_options"].s = tf.compat.as_bytes('{"output":"/home/HwHiAiUser/output","task_trace":"on"}')
config.graph_options.rewrite_options.remapping = RewriterConfig.OFF
config.graph_options.rewrite_options.memory_optimization = RewriterConfig.OFF

with tf.Session(config=config) as sess:
  sess.run()
(Optional) If the problem cannot be spotted, enable training_trace to collect iteration traces.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
custom_op =  config.graph_options.rewrite_options.custom_optimizers.add()
custom_op.name =  "NpuOptimizer"
custom_op.parameter_map["use_off_line"].b = True
custom_op.parameter_map["profiling_mode"].b = True
custom_op.parameter_map["profiling_options"].s = tf.compat.as_bytes('{"output":"/home/HwHiAiUser/output","task_trace":"on","training_trace":"on","aicpu":"on","fp_point":"","bp_point":"","aic_metrics":"PipeUtilization"}')
config.graph_options.rewrite_options.remapping = RewriterConfig.OFF
config.graph_options.rewrite_options.memory_optimization = RewriterConfig.OFF

with tf.Session(config=config) as sess:
  sess.run()

Note that fp_point (start point of the forward propagated operator in iteration traces) and bp_point (end point of the backward propagated operator in iteration traces) are required for collecting iteration traces. You can leave them empty to make the system obtain the values or manually obtain them by referring to How Do I Determine fp_point and bp_point?.

For details about related APIs, see Profiling.