Porting Sample
This section takes the ResNet-50 model as an example to show how to port a TensorFlow 2.6.5 online inference application that runs in the CPU/GPU environment to the Ascend AI Processor.
Preparation
Prepare the TensorFlow 2.6.5 online inference code and datasets, and make sure the code can be run properly on the CPU/GPU.
Online Inference on the CPU/GPU
The TensorFlow 2.6.5 online inference code mainly includes the following actions:
- Prepare the ResNet-50.pb model, input node, output node, and dataset.
- Call sess.run() to perform inference. The feed_dict in sess.run() is used to assign values to the tensor created using placeholder. The feed (input) data can be used as the parameter called by run().
The key inference code is as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | def load_graph(frozen_graph): with tf.io.gfile.GFile(frozen_graph,"rb") as f: graph_def = tf.compat.v1.GraphDef() graph_def.ParseFromString(f.read()) with tf.Graph().as_default() as graph: tf.import_graph_def(graph_def,name="") return graph def NetworkRun(modelPath,inputPath,outputPath): graph = load_graph(modelPath) input_nodes = graph.get_tensor_by_name('Input:0') output_nodes = graph.get_tensor_by_name('Identity:0') with tf.compat.v1.Session(graph=graph) as sess: files = os.listdir(inputPath) files.sort() for file in files: if file.endswith(".bin"): input_img = np.fromfile(inputPath+"/"+file,dtype="float32").reshape(1,224,224,3) t0 = time.time() out = sess.run(output_nodes, feed_dict= {input_nodes: input_img,}) t1 = time.time() out.tofile(outputPath+"/"+"cpu_out_"+file) print("{}, Inference time: {:.3f} ms".format(file,(t1-t0)*1000)) |
Porting the Inference Script to the Ascend AI Processor
- To port the inference script to the Ascend AI Processor, you need to import the NPU configuration library.
1 2 3 4 | import npu_device from npu_device.compat.v1.npu_init import * npu_device.compat.enable_v1() from tensorflow.core.protobuf.rewriter_config_pb2 import RewriterConfig |
- Add the NPU-related configuration before sess.run.
1 2 3 4 5 6 | config_proto = tf.compat.v1.ConfigProto() custom_op = config_proto.graph_options.rewrite_options.custom_optimizers.add() custom_op.name = "NpuOptimizer" custom_op.parameter_map["precision_mode"].s = tf.compat.as_bytes("allow_mix_precision") config_proto.graph_options.rewrite_options.remapping = RewriterConfig.OFF tf_config = npu_config_proto(config_proto=config_proto) |
Checking the Porting Result
In the ported online inference script on the Ascend AI Processor, the execution success flag is the same as the training success flag. You can tell in the following scenarios:
- The keywords tf_adapter and message "The model has been compiled on the Ascend AI processor" are printed.

- The computational graph dump can be generated after you can enable DUMP_GE_GRAPH.
Checking the Inference Performance and Accuracy
The inference performance is computed based on the time difference before and after sess.run() is executed. According to the inference result in this sample, the performance on the NPU is much better than that on the CPU.

The inference accuracy is computed by converting the output .bin file into a .txt file for comparison. According to the inference result in this sample, the accuracy on the NPU is similar to that on the CPU.
