Optimizing by Modifying the Graph

If automatic tuning cannot meet the target, redundant computations have been identified in the model, or the data type needs to be adjusted to switch from AI CPU to AI Core, you can modify the graph to optimize the performance.

For example, if the Cast operation is added before the GlobalAveragePool operation, the graph modification script is as follows:

import onnx
from onnx import helper, TensorProto
from onnx import shape_inference
model_path = "D:\\035-Code\\om_test\\resnet50.onnx"
model = onnx.load(model_path)
def create_cast_node(input_name, output_name, to_type):
    return helper.make_node(
        'Cast',
        inputs=[input_name],
        outputs=[output_name],
        to=to_type
    )
for i, node in enumerate(model.graph.node):
    if node.op_type == 'GlobalAveragePool':
        # Obtain the input of the GlobalAveragePool node.
        input_name = node.input[0]
        # Generate the output name of the new Cast node.
        cast_output_name = f"{input_name}_cast"
        # Create a Cast node.
        cast_node = create_cast_node(input_name, cast_output_name, TensorProto.FLOAT)
        # Update the input of the GlobalAveragePool node.
        node.input[0] = cast_output_name
        # Insert the Cast node into the computation graph.
        model.graph.node.insert(i, cast_node)
        break  # Process only the first found GlobalAveragePool node.
output_model_path = "D:\\035-Code\\om_test\\resnet50_new.onnx"  # Replace it with the path of the model to be saved.
onnx.save(model, output_model_path)

Figure 1 shows the result after the script is executed.

Figure 1 Execution result