Compilation Configurations
The following compilation configurations are required in the online inference script:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
import tensorflow as tf import npu_bridge from npu_bridge.estimator import npu_ops from tensorflow.core.protobuf.rewriter_config_pb2 import RewriterConfig config = tf.ConfigProto() custom_op = config.graph_options.rewrite_options.custom_optimizers.add() custom_op.name = "NpuOptimizer" # Configuration 1: Schedule the inference job to Ascend AI Processor. custom_op.parameter_map["use_off_line"].b = True # Configuration 2: In the online inference scenario, you are advised to retain the default precision selection force_fp16 to achieve better performance. custom_op.parameter_map["precision_mode"].s = tf.compat.as_bytes("force_fp16") # Configuration 3: Select the graph run mode. Set this parameter to 0 in the inference scenario or retain the default value 1 in the training scenario. custom_op.parameter_map["graph_run_mode"].i = 0 # Configuration 4: Disable remapping and MemoryOptimizer. config.graph_options.rewrite_options.remapping = RewriterConfig.OFF config.graph_options.rewrite_options.memory_optimization = RewriterConfig.OFF |
The key configuration options in online inference are summarized as follows:
- Set use_off_line to True to perform inference on Ascend AI Processor.
- Retain the default precision_mode selection (float16) to achieve better performance.
- graph_run_mode: Set it to 0.
The Ascend platform provides functions such as function debugging and performance/precision optimization. You can enable related functions by configuring the sessions. For details about the parameters, see Session Configuration.
Parent topic: Online Inference