Enabling Iteration Offload in Estimator Mode

Automated porting

Search for npu_run_config_init in the ported script and find the run configuration parameter (such as run_config in the example). Pass the session_config parameter to the run configuration function, and add iterations_per_loop to the session_config parameter.

session_config = tf.ConfigProto(allow_soft_placement=True)
custom_op = session_config.graph_options.rewrite_options.custom_optimizers.add()
custom_op.name = 'NpuOptimizer'
custom_op.parameter_map["enable_data_pre_proc"].b = True # The GetNext operator offload is a prerequisite for iteration offload.
custom_op.parameter_map["iterations_per_loop"].i = 10

run_config = tf.estimator.RunConfig(
    train_distribute=distribution_strategy,
    session_config=session_config,       # Add the session_config configuration to the run configuration parameter.
    save_checkpoints_secs=60*60*24)

classifier = tf.estimator.Estimator(
    model_fn=model_function, model_dir=flags_obj.model_dir, config=npu_run_config_init(run_config=run_config))

Add SetIterationsVarHook.

train_hooks = hooks_helper.get_train_hooks(
    flags_obj.hooks,
    model_dir=flags_obj.model_dir,
    batch_size=flags_obj.batch_size)
train_hooks.append(SetIterationsVarHook(10))

Add IterationOp to train_op.

train_op = opt.apply_gradients( grad_var_list, global_step = global_step )
train_op = tf.group(train_op, name="IterationOp")   # Set name to the operator that receives the gradient update.

Manual porting

In Estimator mode, configure iterations_per_loop in NPURunConfig as follows.

from npu_bridge.npu_init import *

session_config=tf.ConfigProto(allow_soft_placement=True)
config = NPURunConfig(session_config=session_config, iterations_per_loop=10)  

In addition, enable the GetNext operator offload, which is a prerequisite for iteration offload. In Estimator mode, the GetNext operator offload is enabled by default, that is, enable_data_pre_proc is set to True by default. Retain the default setting.

Checking Whether iterations_per_loop Takes Effect

After iteration offload is enabled, you can check whether the keyword "Insert op success" exists in the INFO log on the host to determine whether iterations_per_loop takes effect.

You can run the following command to set the log level on the host to INFO. The default output path of INFO logs is $HOME/ascend/log/run/plog/.

export ASCEND_GLOBAL_LOG_LEVEL=1

Parent topic: Iteration Offload