More Features (Model Adaptation)

You can choose to quantize your source Caffe model using quantization factors calculated yourself. Before using ATC to generate an offline model adapted to Ascend AI Processor, use the function provided in this section to convert the model into the CANN format, as models quantized using user-defined quantization factors are not directly convertible by ATC.

Adaptation Principles

Figure 1 API call sequence for uniform quantization shows the adaptation principles. The user implements the operations in blue, while those in gray are implemented by using AMCT's convert_model API. Specifically, import the package to the source Caffe network inference code and call APIs where appropriate for model adaptation. For details about the adaptation example in this scenario, see "Model Adaptation" in Sample List.

Figure 1 Model adaptation principles

Examples

  1. Take the following steps to get started. Update the sample code based on your situation.
  2. To reuse the following code for quantizing a different model, prepare the source model and build quantization factor record file based on user-defined quantization factors yourself.
  1. Import the AMCT package and set the log level (see Set environment variables: for details).
    1
    import amct_caffe as amct
    
  2. Set the device running mode.

    AMCT runs on the CPU or GPU. To run AMCT on the GPU, you first need to configure the Caffe run mode and target device before configuring AMCT's run mode. Since the target device has already been specified here, you do not need to configure the target device in the model inference function.

    1
    2
    3
    4
    5
    6
    if 'gpu':
        caffe.set_mod_gpu()
        caffe.set_device(gpu_id)
        amct.set_gpu_mode()
    else:
        caffe.set_mode_cpu()
    
  3. (Optional) Run inference on the source model in the Caffe environment based on the test dataset to validate the inference script and environment setup. (Update the sample code based on your situation.)

    This step is recommended as it guarantees a properly functioning source model for inference with acceptable accuracy. You can use a subset from the test dataset to improve the efficiency.

    1
    user_test_model(ori_model_file, ori_weights_file, test_data, test_iterations)
    
  4. Call the convert_model API to perform model adaptation.
    This API parses the original model into a graph, preprocesses the graph, parses the input quantization factor file, inserts operators such as AscendQuant and AscendDequant based on the quantization factors and the modified graph structure, and saves the model as a quantization model.
    1
    2
    3
    4
    5
    6
    quant_model_path = './result/user_model'
    record_file = './result/record.txt'
    amct.convert_model(model_file=ori_model_file,
         weights_file=ori_weights_file,
         scale_offset_record_file=record_file,
         save_path=quant_model_path)
    
  5. (Optional) Run inference on the fake-quantized model fake_quant_model and fake_quant_weights in the Caffe environment based on the test dataset to test the accuracy. (Update the sample code based on your situation.)
    Check the accuracy loss of the fake-quantized model by comparing with that of the source model (see 3).
    1
    2
    3
    fake_quant_model = './result/user_model_fake_quant_model.prototxt'
    fake_quant_weights = './result/user_model_fake_quant_weights.caffemodel'
    user_test_model(fake_quant_model, fake_quant_weights, test_data, test_iterations)