Getting Started

This section will guide you through model compression using the CLI with the quantization function and model as an example.

AMCT allows the original network model to be quantized in CLI mode or Python API mode. Currently, only the ONNX, TensorFlow, and Caffe frameworks are supported.

Compressing an ONNX Network Model
Network Model Trained on the Open Source Compression TensorFlow Framework
Network Model Trained on the Open Source Caffe Framework

For other frameworks and features that are not supported, only Python APIs can be used for quantization. Compared with the Python API mode, the CLI mode has the following advantages.

**Table 1** Quantization methods
Using the CLI	Using Python APIs
You simply need to prepare a model and the matched datasets.	You need to possess knowledge of the Python syntax and quantization procedure.
You simply need to determine parameters as opposed to implementing quantization script adaptation.	You need to implement quantization script adaptation.
Currently, only uniform quantization in PTQ and QAT model adaptation to CANN are supported. Currently, only AMCT (ONNX), AMCT (TensorFlow) and AMCT (Caffe) are supported.	All quantization types are supported. All frameworks are supported.

Compressing an ONNX Network Model

Conditions
The AMCT (ONNX) tool package has been installed. For details, see Tool Installation.

Sample Package

Click here to obtain the sample package and upload it to any path on the server where AMCT is located, for example, $HOME/software/AMCT_Pkg/amct_sample.

Decompress the sample package.

Go to the amct_sample directory and decompress the sample package:

unzip samples-master.zip
cd samples-master/python/level1_single_api/9_amct/amct_onnx/cmd

Find the following extracted directories:

|-- README_CN.md
|-- data                                     # Dataset directory
|-- model                                    # Directory where the ONNX model file is stored
|-- scripts
|   |-- run_calibration.sh                   # Encapsulated quantization script
|   |-- run_convert_qat.sh                   # QAT script for adapting to the CANN model
|   |-- run_customized_calibration.sh        # Custom PTQ script
|-- src
|-- process_data.py                      # Dataset preprocessing script, which is used to generate the input data of the model. If the dataset is changed, ensure that the shape of the processed data is the same as that of the model input.
    |-- evaluator.py                         # Built-in Python script that is based on the Evaluator base class and contains the evaluator

Model Quantization
1. Obtain an ONNX network model.
  Click here to obtain the model file (*.onnx) of the resnet101_v11.onnx network and upload the file to the amct_onnx/cmd/model directory on the Linux server as the user who runs the AMCT software package.
2. Prepare a binary dataset that matches the model.
  1. Switch to the amct_onnx/cmd/data directory and run the following command to download the calibration dataset:
```
wget https://obs-9be7.obs.cn-east-2.myhuaweicloud.com/models/amct_acl/classification/imagenet_calibration.tar.gz
tar -zxvf imagenet_calibration.tar.gz
```
    After the execution is complete, a .jpg dataset is generated in the images directory.
  2. In the amct_onnx/cmd directory, run the following command to convert the .jpg dataset in the images folder to a .bin dataset:
```
python3 ./src/process_data.py
```
    After the execution is complete, a new calibration folder is generated in the data folder, containing the generated calibration.bin dataset.
3. Run the following command in any directory to perform quantization The path and file arguments in the command are for reference only.
```
amct_onnx calibration --model ./model/resnet101_v11.onnx --save_path ./results/resnet101_v11 --input_shape "input:16,3,224,224"  --data_dir "./data/calibration"  --data_types "float32"
```
  The amct_onnx binary file is stored in $HOME/.local/bin of the installation user. You can run the amct_onnx calibration --help command to view all related command-line options. For details, see Command-line options.
  - If the AMCT tool cannot be queried after you run the amct_onnx calibration --help command, the Python version used for installing the tool may be incorrect. In this case, run the following environment variable again by referring to Python 3.9.2 Installation on Ubuntu (replace the following path with the actual installation path):
    export PATH=/usr/local/python3.9.2/bin:$PATH
  - The AMCT sample also provides the quantization command (3) and the run_calibration.sh script encapsulated by the dataset preprocessing script in 2.b. After preparing the model and downloading the dataset, you can directly use the script to perform quantization and switch to the amct_onnx/cmd directory. Run the following command:
    bash ./scripts/run_calibration.sh
4. Check the quantization result. If the following information is displayed with no error log, the quantization is successful.
  1
  INFO - [AMCT]:[Utils]: The model file is saved in $HOME/xxx/results/resnet101_v11_fake_quant_model.onnx
  The resultant files and directories are described as follows:
  1. amct_log/amct_onnx.log: AMCT log file.
    The preceding log files will be overwritten when quantization is performed again. You need to save them as required. In addition, the size of the generated log file is related to the number of layers of the model to be quantized. Ensure that the server where AMCT is installed has sufficient space.
    
    Take the ResNet-101 model as an example. If the log level is set to INFO, the log file size is about 5 KB, and the size of the temporary file is about170 MB. If the log level is set to DEBUG, the log file size is about 2 MB, and the size of the temporary file is about 170 MB.
  2. results: quantization result directory, containing:
    1. resnet101_v11_deploy_model.onnx: quantized model file to be deployed on the Ascend AI Processor.
    2. resnet101_v11_fake_quant_model.onnx: quantized model file that can be used for accuracy simulation in the ONNX Runtime environment.
    3. resnet101_v11_quant.json: quantization information file (named after the quantized model). This file gives the node mapping between the quantized model and the original model, and is used for accuracy comparison between them.
  3. (Optional) RandomNumber_Timestamp: directory generated only if AMCT_LOG_LEVEL is set to DEBUG. For details about log level setting, see Set the environment variable..
    1. quant_config.json: quantization configuration file that describes how to quantize each layer in the model. If a quantization configuration file exists in the current directory, the existing file is overwritten by a new one with the same name upon another quantization. Otherwise, a new quantization configuration file is created.
    2. If the accuracy of model inference drops significantly after quantization, you can create a config.cfg file based on the quant_config.json file after quantization by referring to Tuning Workflow. Then, perform quantization again with the --calibration_config option. You can set the amount of data used for calibration and layers to be quantized in the newly created file.
    3. record.txt: file that records quantization factors. For details about the prototype definition of the file, see Record Files.
    4. modified_model.onnx and updated_model.onnx: intermediate files during quantization.

Network Model Trained on the Open Source Compression TensorFlow Framework

Conditions
The AMCT tool package has been installed. For details, see Tool Installation.

Sample Package

Click here to obtain the sample package and upload it to any path on the server where AMCT is located, for example, $HOME/software/AMCT_Pkg/amct_sample.

Decompress the sample package.

Go to the amct_sample directory and decompress the sample package:

unzip samples-master.zip
cd samples-master/python/level1_single_api/9_amct/amct_tensorflow/cmd

Find the following extracted directories:

|-- README_CN.md
|-- data                                     # Dataset directory
|-- model                                    # Directory where the TensorFlow model file is stored
|-- scripts
|   |-- run_calibration.sh                   # Encapsulated quantization script
|   |-- run_convert_qat.sh                   # QAT script for adapting to the CANN model
|   |-- run_customized_calibration.sh        # Custom PTQ script
|-- src
    |-- evaluator.py                         # Built-in Python script that is based on the Evaluator base class and contains the evaluator
|-- process_data.py                      # Dataset preprocessing script, which is used to generate the input data of the model. If the dataset is changed, ensure that the shape of the processed data is the same as that of the model input.

Model Quantization
1. Obtain the TensorFlow network model to be quantized.
  Click here to obtain the MobileNetV2 network model. Extract the .pb file from the obtained package and upload it to the amct_tensorflow/cmd/model directory on the Linux server as the AMCT running user.
2. Prepare a binary dataset that matches the model.
  1. Switch to the amct_tensorflow/cmd/data directory and run the following command to download the calibration dataset:
```
cd amct_tensorflow/cmd/data
mkdir image && cd image
wget https://obs-9be7.obs.cn-east-2.myhuaweicloud.com/models/amct_acl/classification/calibration.rar
unrar e calibration.rar
```
    After the preceding command is executed, a .jpg dataset is generated in the image/calibration directory.
  2. In the amct_tensorflow/cmd directory, run the following command to convert the .jpg dataset in the calibration folder to a .bin dataset:
```
python3 ./src/process_data.py
```
    After the execution is complete, a new calibration folder is generated in the data folder, containing the generated calibration.bin dataset.
3. Run the following command in any directory to perform quantization (The path and file arguments in the command are for reference only.)
```
amct_tensorflow calibration --model=./model/mobilenet_v2_1.0_224_frozen.pb --save_path=./results/mobilenet_v2 --outputs="MobilenetV2/Predictions/Reshape_1:0"  --input_shape="input:32,224,224,3" --data_dir="./data/calibration/" --data_types="float32"
```
  The amct_tensorflow binary file is stored in $HOME/.local/bin of the installation user. You can run the amct_tensorflow calibration --help command to view all command-line parameters. For details, see Command-line options.
  - If the AMCT tool cannot be queried after you run the amct_tensorflow calibration --help command, the Python version used for installing the tool may be incorrect. In this case, run the following environment variable again by referring to Python 3.9.2 Installation on Ubuntu (replace the following path with the actual installation path):
    export PATH=/usr/local/python3.9.2/bin:$PATH
  - The AMCT sample also provides the quantization command (3) and the run_calibration.sh script encapsulated by the dataset preprocessing script in 2.b. After preparing the model and downloading the dataset, you can directly perform quantization using the script and switch to the amct_tensorflow/cmd directory. Run the following command:
    bash ./scripts/run_calibration.sh
4. Check the quantization result. If the following information is displayed with no error log, the quantization is successful.
  1
  INFO - [AMCT]:[save_model]: The model is saved in $HOME/xxx/results/mobilenet_v2_quantized.pb
  The resultant files and directories are described as follows:
  1. amct_log/amct_caffe.log: AMCT log file.
    The preceding log files will be overwritten when quantization is performed again. You need to save them as required. In addition, the size of the generated log file is related to the number of layers of the model to be quantized. Ensure that the server where ATC is installed has sufficient space.
    
    Take the ResNet-101 model as an example. If the log level is set to INFO, the log file size is about 12 KB, and the size of the temporary file is about 260 MB. If the log level is set to DEBUG, the log file size is about 390 KB, and the size of the temporary file is about 430 MB.
  2. results/nuq_calibration_results: quantization result directory, containing:
    1. mobilenet_v2_quantized.pb: quantized model that can be used for accuracy simulation in the TensorFlow environment and can be deployed in the.
    2. resnet101_v11_quant.json: quantization information file (named after the quantized model). This file gives the node mapping between the quantized model and the original model, and is used for accuracy comparison between them.
  3. (Optional) RandomNumber_Timestamp: directory generated only if set_logging_level is set to debug.
    1. quant_config.json: quantization configuration file that describes how to quantize each layer in the model. If a quantization configuration file exists in the current directory, the existing file is overwritten by a new one with the same name upon another quantization. Otherwise, a new quantization configuration file is created.
    2. If the accuracy of model inference drops significantly after quantization, you can create a config.cfg file based on the quant_config.json file after quantization by referring to Tuning Workflow. Then, perform quantization again with the --calibration_config option. You can set the amount of data used for calibration and layers to be quantized in the newly created file.
    3. record.txt: file that records quantization factors. For details about the prototype definition of the file, see Record Files.

Network Model Trained on the Open Source Caffe Framework

Conditions
The AMCT tool package has been installed. For details, see Tool Installation.

Sample Package

Click here to obtain the sample package and upload it to any path on the server where AMCT is located, for example, $HOME/software/AMCT_Pkg/amct_sample.

Decompress the sample package.

Go to the amct_sample directory and decompress the sample package:

unzip samples-master.zip
cd samples-master/python/level1_single_api/9_amct/amct_caffe/cmd

Find the following extracted directories:

|-- README_CN.md
|-- data                                        # Dataset directory
|-- model                                       # Directory where the Caffe model file is stored
|-- scripts
|   |-- run_calibration.sh                      # Encapsulated quantization script
|   |-- run_customized_calibration.sh           # Custom PTQ script
|-- src
    |-- download_models.py                      # Script for downloading model files
    |-- evaluator.py                            # Built-in Python script that is based on the Evaluator base class and contains the evaluator
    |-- process_data.py                         # Dataset preprocessing script, which is used to generate the input data of the model. If the dataset is changed, ensure that the shape of the processed data is the same as that of the model input.

Model Quantization

Due to software restrictions (the input data cannot be of DT_INT8 type in the dynamic shape scenario), when the ATC tool is used to convert the quantized deployable model, dynamic shape–related options must not be used, such as --dynamic_batch_size and --dynamic_image_size. Otherwise, the model conversion fails.
When the ATC tool is used to convert a deployable model quantized by the AMCT tool, the high-precision feature cannot be used. For example, force_fp32 or must_keep_origin_dtype (fp32 input of the original graph) cannot be configured through --precision_mode, origin cannot be configured through --precision_mode_v2, and high_precision cannot be configured through --op_precision_mode. When quantization parameters are set in high-precision mode, neither the performance benefits of quantization nor the precision benefits of the high-precision mode can be obtained.

Obtain the Caffe network model to be quantized.

Go to the amct_caffe/cmd directory and run the following commands to download the model file and weight file:

python3 ./src/download_models.py --close_certificate_verify

The --close_certificate_verify parameter is optional and is used to disable certificate verification to ensure that the model can be downloaded properly. If an authentication failure message is displayed during model download, you can add this parameter to download the model again.

If the following information is displayed, the model file is successfully downloaded:

[INFO]Download 'ResNet-50-deploy.prototxt' to 'xxx/amct_caffe/cmd/model/ResNet-50-deploy.prototxt' success.
[INFO]Download file_name to 'xxx/amct_caffe/cmd/model/ResNet-50-model.caffemodel' success.

You can view the downloaded model file in amct_caffe/cmd/model as prompted.

Prepare a binary dataset that matches the model.
1. Switch to the amct_caffe/cmd directory and run the following command to download the calibration dataset:
```
cd data 
mkdir image && cd image
wget https://obs-9be7.obs.cn-east-2.myhuaweicloud.com/models/amct_acl/classification/calibration.rar
unrar e calibration.rar
```
2. In the amct_caffe/cmd directory, run the following command to convert the .jpg dataset in the calibration folder to a .bin dataset:
```
python3 ./src/process_data.py
```
  After the execution is complete, a new calibration folder is generated in the data folder, containing the generated calibration.bin dataset.
Run the following command in any directory to perform quantization (The path and file arguments in the command are for reference only.)
```
amct_caffe calibration --model=./model/ResNet-50-deploy.prototxt --weights=./model/ResNet-50-model.caffemodel --save_path=./results/Resnet-50 --input_shape="data:1,3,224,224" --data_dir="./data/calibration" --data_types="float32"
```
The amct_caffe binary file is stored in $HOME/.local/bin of the installation user. You can run the amct_caffe calibration --help command to view all command-line options used above. For details, see Command-line options.
- If the AMCT tool cannot be queried after you run the amct_onnx calibration --help command, the Python version used for installing the tool may be incorrect. In this case, run the following environment variable again by referring to Python 3.9.2 Installation on Ubuntu (replace the following path with the actual installation path):
```
export PATH=/usr/local/python3.9.2/bin:$PATH
```
- The AMCT sample also provides the quantization command (3) and the run_calibration.sh script encapsulated by the dataset preprocessing script in 2.b. After preparing the model and downloading the dataset, you can directly use the script to perform quantization and switch to the amct_caffe/cmd directory. Run the following command:
```
bash ./scripts/run_calibration.sh
```

Check the quantization result. If the following information is displayed with no error log, the quantization is successful.

INFO - [AMCT]:[Utils]: The weights_file is saved in $HOME/xxx/results/Resnet-50_fake_quant_weights.caffemodel
INFO - [AMCT]:[Utils]: The model_file is saved in $HOME/xxx/results/Resnet-50_fake_quant_model.prototxt

The resultant files and directories are described as follows:

amct_log/amct_caffe.log: AMCT log file.
The preceding log files will be overwritten when quantization is performed again. You need to save them as required. In addition, the size of the generated log file is related to the number of layers of the model to be quantized. Ensure that the server where ATC is installed has sufficient space.

Take the ResNet-101 model as an example. If the log level is set to INFO, the log file size is about 12 KB, and the size of the temporary file is about 260 MB. If the log level is set to DEBUG, the log file size is about 390 KB, and the size of the temporary file is about 430 MB.
results/nuq_calibration_results: quantization result directory, containing:
- ResNet-50_deploy_model.prototxt: quantized model file that is deployable on the Ascend AI Processor.
- ResNet-50_deploy_weights.caffemodel: weight file of the quantized model that is deployable on the Ascend AI Processor.
- ResNet-50_fake_quant_model.prototxt: fake-quantized model file for accuracy simulation in the Caffe environment.
- ResNet-50_fake_quant_weights.caffemodel: fake-quantized weight file for accuracy simulation in the Caffe environment.
- ResNet-50_quant.json: quantization information file (named after the quantized model). This file gives the node mapping between the quantized model and the original one and is used for accuracy comparison between these two models.
(Optional) RandomNumber_Timestamp: directory generated only if AMCT_LOG_LEVEL is set to DEBUG. For details about log level setting, see Set environment variables:.
- quant_config.json: quantization configuration file that describes how to quantize each layer in the model. If a quantization configuration file exists in the current directory, the existing file is overwritten by a new one with the same name upon another quantization. Otherwise, a new quantization configuration file is created.
  If the accuracy of model inference drops significantly after quantization, you can create a config.cfg file based on the quant_config.json file after quantization by referring to Manual Tuning. Then, perform quantization again with the --calibration_config option. You can set the amount of data used for calibration and layers to be quantized in the newly created file.
- modified_model.prototxt and modified_model.caffemodel: intermediate model files in quantization.
- record.txt: file that records quantization factors. For details about the prototype definition of the file, see Record Files.