Compilation Configurations

The following compilation configurations are required in the online inference script:

     
          import tensorflow as tf
import npu_bridge
from npu_bridge.estimator import npu_ops
from tensorflow.core.protobuf.rewriter_config_pb2 import RewriterConfig

config = tf.ConfigProto()
custom_op = config.graph_options.rewrite_options.custom_optimizers.add()
custom_op.name = "NpuOptimizer"
# Configuration 1: Schedule the inference job to Ascend AI Processor.
custom_op.parameter_map["use_off_line"].b = True

# Configuration 2: In the online inference scenario, you are advised to retain the default precision selection force_fp16 to achieve better performance.
custom_op.parameter_map["precision_mode"].s = tf.compat.as_bytes("force_fp16")

# Configuration 3: Select the graph run mode. Set this parameter to 0 in the inference scenario or retain the default value 1 in the training scenario.
custom_op.parameter_map["graph_run_mode"].i = 0

# Configuration 4: Disable remapping and MemoryOptimizer.
config.graph_options.rewrite_options.remapping = RewriterConfig.OFF
config.graph_options.rewrite_options.memory_optimization = RewriterConfig.OFF

The key configuration options in online inference are summarized as follows:

Set use_off_line to True to perform inference on Ascend AI Processor.
Retain the default precision_mode selection (float16) to achieve better performance.
graph_run_mode: Set it to 0.

The Ascend platform provides functions such as function debugging and performance/precision optimization. You can enable related functions by configuring the sessions. For details about the parameters, see Session Configuration.

Parent topic: Online Inference