混合计算

概述

昇腾AI处理器默认采用计算全下沉模式，即所有的计算类算子全部在Device侧执行。

混合计算模式作为计算全下沉模式的补充，将部分不可离线编译下沉执行的算子留在前端框架中在线执行，用于提升昇腾AI处理器支持Tensorflow的适配灵活性。

使能混合计算

用户可通过配置项mix_compile_mode开启混合计算功能：

import tensorflow as tf
from npu_bridge.estimator import npu_ops
from npu_bridge.estimator.npu import npu_scope
from tensorflow.core.protobuf.rewriter_config_pb2 import RewriterConfig

config = tf.ConfigProto()
custom_op =  config.graph_options.rewrite_options.custom_optimizers.add()
custom_op.name =  "NpuOptimizer"
custom_op.parameter_map["use_off_line"].b = True
custom_op.parameter_map["mix_compile_mode"].b =  True   # 开启混合计算
config.graph_options.rewrite_options.remapping = RewriterConfig.OFF

通过上述配置，系统会将默认不下沉的算子留在前端框架执行，除此之前，还可以自行指定下沉或不下沉的首尾算子，系统会将首尾范围内的算子全部做下沉或不下沉处理。

指定下沉或不下沉的首尾算子

指定下沉或不下沉的首尾算子，系统会将首尾范围内的算子全部做下沉或不下沉处理，例如，对于yolo_v3网络，希望将网络主体部分和后处理部分下沉到昇腾AI处理器执行，前处理部分留在前端框架执行。

可以通过指定首尾算子，将in_nodes, out_nodes范围内的算子全部下沉到昇腾AI处理器执行：

import tensorflow as tf
from npu_bridge.estimator import npu_ops
from npu_bridge.estimator.npu import npu_scope
from tensorflow.core.protobuf.rewriter_config_pb2 import RewriterConfig

config = tf.ConfigProto()
custom_op =  config.graph_options.rewrite_options.custom_optimizers.add()
custom_op.name =  "NpuOptimizer"
custom_op.parameter_map["use_off_line"].b = True  
custom_op.parameter_map["mix_compile_mode"].b =  True  # 开启混合计算
config.graph_options.rewrite_options.remapping = RewriterConfig.OFF

all_graph_iop = []
in_nodes = []
out_nodes = []

in_nodes.append('import/conv2d_1/con')
in_nodes.append('while_1/strided_sli')
in_nodes.append('while_1/strided_sli')
in_nodes.append('concat')
in_nodes.append('while_1/Const')
in_nodes.append('while_1/Const_4')
in_nodes.append('while_1/Const_2')
in_nodes.append('zeros_7')
in_nodes.append('Const_6')
in_nodes.append('zeros_4')
in_nodes.append('ConstantFolding/whi')
out_nodes.append('strided_slice_13')
all_graph_iop.append([in_nodes, out_nodes])
custom_op.parameter_map['in_out_pair'].s = tf.compat.as_bytes(str(all_graph_iop))

也可以反过来实现，将前处理部分留在前端框架执行，即将in_nodes, out_nodes范围内的算子全部留在前端框架执行：

import tensorflow as tf
from npu_bridge.estimator import npu_ops
from npu_bridge.estimator.npu import npu_scope
from tensorflow.core.protobuf.rewriter_config_pb2 import RewriterConfig

config = tf.ConfigProto()
custom_op =  config.graph_options.rewrite_options.custom_optimizers.add()
custom_op.name =  "NpuOptimizer"
custom_op.parameter_map["use_off_line"].b = True  
custom_op.parameter_map["mix_compile_mode"].b =  True  # 开启混合计算
config.graph_options.rewrite_options.remapping = RewriterConfig.OFF

all_graph_iop = []
in_nodes = []
out_nodes = []

in_nodes.append('_arg_tf_image_string_0_0')
in_nodes.append('strided_slice_1/stack')
in_nodes.append('strided_slice_1/stack_1')
in_nodes.append('Const')
in_nodes.append('zeros')
in_nodes.append('zeros_1')
in_nodes.append('Const_2')
out_nodes.append('strided_slice_2')
out_nodes.append('strided_slice_3')
out_nodes.append('strided_slice_4')
all_graph_iop.append([in_nodes, out_nodes])
custom_op.parameter_map['in_out_pair_flag'].b = False
custom_op.parameter_map['in_out_pair'].s = tf.compat.as_bytes(str(all_graph_iop))

父主题： 更多特性