Enabling Iteration Offload in Estimator Mode

Automated porting

  1. Search for npu_run_config_init in the ported script and find the run configuration parameter (such as run_config in the example). Pass the session_config parameter to the run configuration function, and add iterations_per_loop to the session_config parameter.
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    session_config = tf.ConfigProto(allow_soft_placement=True)
    custom_op = session_config.graph_options.rewrite_options.custom_optimizers.add()
    custom_op.name = 'NpuOptimizer'
    custom_op.parameter_map["enable_data_pre_proc"].b = True # The GetNext operator offload is a prerequisite for iteration offload.
    custom_op.parameter_map["iterations_per_loop"].i = 10
    
    run_config = tf.estimator.RunConfig(
        train_distribute=distribution_strategy,
        session_config=session_config,       # Add the session_config configuration to the run configuration parameter.
        save_checkpoints_secs=60*60*24)
    
    classifier = tf.estimator.Estimator(
        model_fn=model_function, model_dir=flags_obj.model_dir, config=npu_run_config_init(run_config=run_config))
    
  2. Add SetIterationsVarHook.
    1
    2
    3
    4
    5
    train_hooks = hooks_helper.get_train_hooks(
        flags_obj.hooks,
        model_dir=flags_obj.model_dir,
        batch_size=flags_obj.batch_size)
    train_hooks.append(SetIterationsVarHook(10))
    
  3. Add IterationOp to train_op.
    1
    2
    train_op = opt.apply_gradients( grad_var_list, global_step = global_step )
    train_op = tf.group(train_op, name="IterationOp")   # Set name to the operator that receives the gradient update.
    

Manual porting

In Estimator mode, configure iterations_per_loop in NPURunConfig as follows.

1
2
3
4
from npu_bridge.npu_init import *

session_config=tf.ConfigProto(allow_soft_placement=True)
config = NPURunConfig(session_config=session_config, iterations_per_loop=10)  

In addition, enable the GetNext operator offload, which is a prerequisite for iteration offload. In Estimator mode, the GetNext operator offload is enabled by default, that is, enable_data_pre_proc is set to True by default. Retain the default setting.

Checking Whether iterations_per_loop Takes Effect

After iteration offload is enabled, you can check whether the keyword "Insert op success" exists in the INFO log on the host to determine whether iterations_per_loop takes effect.

You can run the following command to set the log level on the host to INFO. The default output path of INFO logs is $HOME/ascend/log/run/plog/.

export ASCEND_GLOBAL_LOG_LEVEL=1