What Do I Do If ONNXRuntimeError Is Reported During Inference of a QDQ Model?

Symptom

When a Quantize and DeQuantize (QDQ) model, such as a QAT model converted from TensorFlow or exported from PyTorch, is used for inference in a CPU environment of ONNX Runtime 1.8.0 or later, the ONNXRuntimeError shown in the following figure will be reported. (For details about the QDQ model, click here.)

Possible Cause

To improve performance, ONNX Runtime provides multiple graph optimization modes, all of which are enabled by default. During optimization, the model is incompatible with the ONNX Runtime version. As a result, the preceding error is reported.

Solution

When calling the InferenceSession function to perform inference, set the graph optimization level attribute of the argument SessionOptions to ORT_DISABLE_ALL to disable all optimization modes.
1
2
3
4
5
6
import onnxruntime as ort
import amct_onnx as amct

amct.AMCT_SO.graph_optimization_level = ort.GraphOptimizationLevel.ORT_DISABLE_ALL
ort_session = ort.InferenceSession('model.onnx', amct.AMCT_SO)
...