概述

模型量化压缩主要对卷积和全连接层从FP32量化到INT8，用来解决边侧推理设备及算力存在多样性、模型优化缺少硬件感知及自动优化过程的问题，根据边缘部署精度、算力、时延、内存不同约束目标，实现自动化压缩和量化训练模型。

详细实现过程请参见量化调优过程，具体量化步骤请参见操作步骤。

量化调优过程

量化调优过程（nas）包括以下步骤：

读取输入模型构建Graph IR,用于量化算子定位、图结构优化、算子特征提取。此步骤需要用户提供一个返回模型实例的接口。
进行模型特征提取，提取算子Kernel参数、Shape参数、算子上下文结构等信息。
通过强化学习算法进行量化位宽搜索，构建候选量化位宽策略。
针对候选量化位宽策略构建量化后的模型结构及其Graph IR。
通过量化算法进行图结构优化、权重和激活值的量化算子参数标定和校正。对复杂模型，例如图像检测分割和transformer类的模型，需要提供一个校正接口，用来校正激活值的量化参数，接口实现主要是用部分训练集做一个epoch的推理，没有反向传播操作。
对量化后的模型进行精度评估。对复杂模型，例如图像检测分割和transformer类的模型，需要提供一个精度评估接口，返回量化后模型的精度。
生成可部署量化模型，进行在环测评，并将测评结果反馈到位宽搜索算法。
重复3~6步，直到搜索出满足指定精度、压缩率和时延指标的量化模型。

操作步骤

以文本生成下游任务为例，操作步骤如下。

进入{CANN包安装路径}/ascend-toolkit/latest/tools/ascend_automl/examples/mindspore/quant/mm/caption/目录。
参见opt_caption_ms_quant.md下载紫东.太初模型源码并对模型脚本做适配修改。

在opt_caption_ms_quant.yml文件中根据实际情况配置以下加粗字段。

general:
  backend: mindspore
  parallel_search: False
  device_evaluate_before_train: False
  task:
    local_base_path: ./tasks/
    task_id: "quant_caption"
  logger:
    level: debug
  worker:
    timeout: 7200000000

register:
  pkg_path: [ "/data/automl/opt/" ]   #紫东.太初源码路径
  modules:
    - module: "src.scripts.pretrain_three_caption_endtoend_eval" # 模块导入
      script_network: "opt_caption_model_eval"  #获取模型实例的函数
      script_network_input: ["get_input_data"]  #获取模型输入的函数
      ori_correct_func: ["calib_func"]  #量化参数校正的函数
      ori_eval_func: ["eval_func"]  #量化后模型精度评估的函数

pipeline: [ nas ]

nas:
  pipe_step:
    type: SearchPipeStep
  model:
    model_desc:
      type: ScriptModelGen
      common:
        network:
          type: opt_caption_model_eval  #上述注册获取模型实例的函数
          file_config: &opt_config_file /data/automlopt/config/ftcap_quant.json  #模型脚本参数配置文件
          config: &model_config
            start_learning_rate: 5.0e-5
            end_learning_rate: 1.0e-7
            decay_steps: 120000
            use_txt_out: False
            use_video: False
            use_parallel: False
            data_type: 2
            audio_dim: 512
            img_dim: 768
            use_data_fix: True
            use_mask_fix: True
            init_loss_scale: 65536
            loss_scale_factor: 2
            scale_window: 1000
            full_batch: False
            use_moe: False
            data_url: a
            train_url: a
            ckpt_file: /data/ckpt/opt_end2end_baseline.ckpt
            output_dir: /data/caption_output/ftcap_coco_vit_en2end/
            beam_width: 1
            use_vit: False
            use_patch: True
        multiple_inputs:
          type: get_input_data  #上述已注册的获取模型输入的函数


  search_algorithm:
    type: MsQuantRL
    codec: QuantRLCodec
    policy:
      max_episode: 20   # Max eposide, recommended value>100, bigger is better, but it takes longer to learn.
      # If this value cannot be determined, please set this value to a large value, and then /
      # set a quantitative target and stop learning early to avoid the setting of this value.
      num_warmup: 10    # time without training but only filling the replay memory, recommended:10-20
    objective_keys: [ 'accuracy','compress_ratio', 'flops']  # accuracy must be one of objective keys
    reward_type: 'compress_first'  # choice: acc_first | compress_first
    custom_reward: False
    # ratio of metric to accuracy in reward. If custom_reward if false,this value doesn't need to be configured.
    # Besides, if custom_reward is true, you can set metric_ratio to 0 so that the metric is not used in reward.
    # Otherwise, this value is recommended to be greater than 0.1.
    metric_to_reward: flops  # metric to calculate reward
    metric_ratio: 0.5
    stop_early: False
    acc_threshold: 0.5
    latency_threshold: 5
    compress_threshold: 40

  search_space:
    type: SearchSpace
    hyperparameters:
      - key: network.bit_candidates
        type: CATEGORY
        range: [ 8, 32 ]

  trainer:
    type: QuantTrainer
    seed: 234
    calib_portion: 0.001
    callbacks: [OptExportCallback]
    custom_calib:
      type: calib_func  #已注册的校正函数
      file_config: *opt_config_file
      config: *model_config
    custom_eval:
      type: eval_func  #已注册的精度评估函数
      file_config: *opt_config_file
      config: *model_config
      metric_name: "CIDEr"

  evaluator:
    type: Evaluator
    device_evaluator:
      type: DeviceEvaluator
      custom: QuantCustomEvaluator
      om_input_shape: 'input_0:1,3,224,224'
      backend: mindspore
      delete_eval_model: False
      hardware: "Davinci"
      remote_host: "http://x.x.x.x:x"
      repeat_times: 1
      muti_input: True

启动文本生成下游任务。
```
mxOps clibration -m taichu -c opt_caption_ms_quant.yml -d NPU
```
除mxOps工具外，可使用Vega启动任务，参考如下命令：
```
vega opt_caption_ms_quant.yml -d NPU
```

大模型量化

概述

量化调优过程

操作步骤