What Do I Do If ONNXRuntimeError Is Reported During Inference of a QDQ Model?
Symptom
When a Quantize and DeQuantize (QDQ) model, such as a QAT model converted from TensorFlow or exported from PyTorch, is used for inference in a CPU environment of ONNX Runtime 1.8.0 or later, the ONNXRuntimeError shown in the following figure will be reported. (For details about the QDQ model, click here.)

Possible Cause
To improve performance, ONNX Runtime provides multiple graph optimization modes, all of which are enabled by default. During optimization, the model is incompatible with the ONNX Runtime version. As a result, the preceding error is reported.
Solution
When calling the InferenceSession function to perform inference, set the graph optimization level attribute of the argument SessionOptions to ORT_DISABLE_ALL to disable all optimization modes.
1 2 3 4 5 6 | import onnxruntime as ort import amct_onnx as amct amct.AMCT_SO.graph_optimization_level = ort.GraphOptimizationLevel.ORT_DISABLE_ALL ort_session = ort.InferenceSession('model.onnx', amct.AMCT_SO) ... |
Parent topic: FAQ