How Can I Achieve Expected Performance with Operators Including ResourceConditionalAccumulator?

Symptom

The network (such as OSMN) fails to achieve satisfactory performance due to a large number of resource operators including ResourceConditionalAccumulator and ResourceAccumulatorTakeGradient.

Possible Cause

Currently, the Ascend AI Processor uses the full offload mode by default. These operators show expensive scheduling and memory copy on the Ascend AI Processor, resulting in unsatisfactory performance.

Solution

Enable mixed computing to execute these operators on the host.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
from npu_bridge.npu_init import *

config = tf.ConfigProto()
custom_op = config.graph_options.rewrite_options.custom_optimizers.add()
custom_op.name = "NpuOptimizer"
custom_op.parameter_map["mix_compile_mode"].b =  True
config.graph_options.rewrite_options.remapping = RewriterConfig.OFF
config.graph_options.rewrite_options.memory_optimization = RewriterConfig.OFF

with tf.Session(config=config) as sess:
    sess.run(...)