Mixed Computing
Mixed computing is provided to deliver flexibility and extensibility for scenarios where the computation graph contains unsupported operators (such as py_func). These operators remain on the host and are executed by the frontend framework.
Overview
For the Ascend AI Processor, the full offload mode is used by default, that is, all compute operators are offloaded to the device. As a supplement to the full offload mode, mixed computing allows certain operators (such as resource operators) to be executed online within the frontend framework, improving Ascend AI Processor's adaptability to TensorFlow.
Principles
In mixed computing scenarios, after identifying offloadable operators, TF Adapter partitions the entire graph into multiple GEOPs. Data transfer for unoffloadable operators is performed via memcpy.

Precautions
- In mixed computing mode, iteration offload is not supported. That is, iterations_per_loop must retain the default value 1.
- In addition to the operators that are not offloaded by default, you can also configure additional operators that are not offloaded by using without_npu_compile_scope.
- The FusedBatchNormV3 operator was introduced in 2019. Its fifth output is a CUDA-related optimized output, which is not supported on the Ascend AI Processor in mixed computing mode. If tf.layers.batch_normalization is used in your training script, you can use with compat.forward_compatibility_horizon(2019, 5, 1): to skip this operator.
In Estimator Mode
1 2 3 4 | from npu_bridge.npu_init import * session_config=tf.ConfigProto(allow_soft_placement=True) config = NPURunConfig(session_config=session_config, mix_compile_mode=True, iterations_per_loop=1) |
In sess.run Mode
1 2 3 4 5 6 7 8 9 | import tensorflow as tf from npu_bridge.npu_init import * config = tf.ConfigProto(allow_soft_placement=True) custom_op = config.graph_options.rewrite_options.custom_optimizers.add() custom_op.name = "NpuOptimizer" custom_op.parameter_map["mix_compile_mode"].b = True config.graph_options.rewrite_options.remapping = RewriterConfig.OFF config.graph_options.rewrite_options.memory_optimization = RewriterConfig.OFF |
In Keras Mode
The configuration method is similar to that in sess.run mode.
Operator Retaining
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | import tensorflow as tf from npu_bridge.npu_init import * X = tf.random_normal([2,]) Y = tf.random_normal([2,]) with npu_scope.without_npu_compile_scope(): pred = tf.add(tf.multiply(X, 1.), 0.) # Specify tf.add and tf.multiply as operators not offloaded. cost = tf.reduce_sum(tf.abs(pred-Y)) config = tf.ConfigProto(allow_soft_placement=True) custom_op = config.graph_options.rewrite_options.custom_optimizers.add() custom_op.name = "NpuOptimizer" custom_op.parameter_map["mix_compile_mode"].b = True config.graph_options.rewrite_options.remapping = RewriterConfig.OFF config.graph_options.rewrite_options.memory_optimization = RewriterConfig.OFF with tf.Session(config=config) as sess: print(sess.run(cost)) |