In Keras Mode
- Check whether init_resource exists in the ported script.
- If it exists, modify it by referring to the following example. After the modification is complete, go to the next step.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
if __name__ == '__main__': session_config = tf.ConfigProto(allow_soft_placement=True) custom_op = session_config.graph_options.rewrite_options.custom_optimizers.add() custom_op.name = "NpuOptimizer" # Enable profiling. custom_op.parameter_map["profiling_mode"].b = True # Collect only task trace data. custom_op.parameter_map["profiling_options"].s = tf.compat.as_bytes('{"output":"/home/HwHiAiUser/output","task_trace":"on"}') # Collect task trace data and iteration trace data. You can collect only the task trace data. If the problem cannot be analyzed, collect the iteration trace data. # custom_op.parameter_map["profiling_options"].s = tf.compat.as_bytes('{"output":"/home/HwHiAiUser/output","task_trace":"on","training_trace":"on","aicpu":"on","fp_point":"","bp_point":"","aic_metrics":"PipeUtilization"}') (npu_sess, npu_shutdown) = init_resource(config=session_config) tf.app.run() shutdown_resource(npu_sess, npu_shutdown) close_session(npu_sess)
Note that only the parameters supported in initialize_system can be configured in config of the init_resource function. For other functions, configure them in config of the set_keras_session_npu_config function.
- profiling_mode: profiling enable.
- output: path for storing profile data. Create the specified directory in the training environment (container or host) in advance. The running user configured during installation must have the read and write permissions on this path. It can be either an absolute path or a relative path.
- task_trace: task trace collection enable.
- training_trace: iteration trace collection enable. If it is set to on, both fp_point and bp_point need to be configured.
- aicpu: whether to collect details about the AI CPU operator, such as the operator execution time and data copy time.
- fp_point: start point of the forward propagated operator in iteration traces. This parameter is used to record the start timestamp of forward propagation. You can leave it empty to make the system obtain the values or manually obtain them by referring to How Do I Determine fp_point and bp_point?.
- bp_point: end point of the backward propagated operator in iteration traces. This parameter is used to record the end timestamp of backward propagation. You can leave it empty to make the system obtain the values or manually obtain them by referring to How Do I Determine fp_point and bp_point?.
- aic_metrics: AI Core hardware information. The value PipeUtilization indicates the percentages of time taken by compute units and MTEs.
- For details about profiling configuration, see Profiling.
- If it does not exist, go to the next step.
- If it exists, modify it by referring to the following example. After the modification is complete, go to the next step.
- Find set_keras_session_npu_config in the script and configure profiling parameters.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
import tensorflow as tf import tensorflow.python.keras as keras from tensorflow.python.keras import backend as K from npu_bridge.npu_init import * config_proto = tf.ConfigProto(allow_soft_placement=True) custom_op = config_proto.graph_options.rewrite_options.custom_optimizers.add() custom_op.name = 'NpuOptimizer' # Enable profiling. custom_op.parameter_map["profiling_mode"].b = True # Collect only task trace data. custom_op.parameter_map["profiling_options"].s = tf.compat.as_bytes('{"output":"/home/HwHiAiUser/output","task_trace":"on"}') # Collect task trace data and iteration trace data. You can collect only the task trace data first. If the problem cannot be analyzed, collect the iteration trace data. # custom_op.parameter_map["profiling_options"].s = tf.compat.as_bytes('{"output":"/home/HwHiAiUser/output","task_trace":"on","training_trace":"on","aicpu":"on","fp_point":"","bp_point":"","aic_metrics":"PipeUtilization"}') npu_keras_sess = set_keras_session_npu_config(config=config_proto) # Preprocess data... # Construct a model... # Compile the model... # Train the model...