模型量化压缩主要对卷积和全连接层从FP32量化到INT8,用来解决边侧推理设备及算力存在多样性、模型优化缺少硬件感知及自动优化过程的问题,根据边缘部署精度、算力、时延、内存不同约束目标,实现自动化压缩和量化训练模型。
以文本生成下游任务为例,操作步骤如下。
general: backend: mindspore parallel_search: False device_evaluate_before_train: False task: local_base_path: ./tasks/ task_id: "quant_caption" logger: level: debug worker: timeout: 7200000000 register: pkg_path: [ "/data/automl/opt/" ] #紫东.太初源码路径 modules: - module: "src.scripts.pretrain_three_caption_endtoend_eval" # 模块导入 script_network: "opt_caption_model_eval" #获取模型实例的函数 script_network_input: ["get_input_data"] #获取模型输入的函数 ori_correct_func: ["calib_func"] #量化参数校正的函数 ori_eval_func: ["eval_func"] #量化后模型精度评估的函数 pipeline: [ nas ] nas: pipe_step: type: SearchPipeStep model: model_desc: type: ScriptModelGen common: network: type: opt_caption_model_eval #上述注册获取模型实例的函数 file_config: &opt_config_file /data/automlopt/config/ftcap_quant.json #模型脚本参数配置文件 config: &model_config start_learning_rate: 5.0e-5 end_learning_rate: 1.0e-7 decay_steps: 120000 use_txt_out: False use_video: False use_parallel: False data_type: 2 audio_dim: 512 img_dim: 768 use_data_fix: True use_mask_fix: True init_loss_scale: 65536 loss_scale_factor: 2 scale_window: 1000 full_batch: False use_moe: False data_url: a train_url: a ckpt_file: /data/ckpt/opt_end2end_baseline.ckpt output_dir: /data/caption_output/ftcap_coco_vit_en2end/ beam_width: 1 use_vit: False use_patch: True multiple_inputs: type: get_input_data #上述已注册的获取模型输入的函数 search_algorithm: type: MsQuantRL codec: QuantRLCodec policy: max_episode: 20 # Max eposide, recommended value>100, bigger is better, but it takes longer to learn. # If this value cannot be determined, please set this value to a large value, and then / # set a quantitative target and stop learning early to avoid the setting of this value. num_warmup: 10 # time without training but only filling the replay memory, recommended:10-20 objective_keys: [ 'accuracy','compress_ratio', 'flops'] # accuracy must be one of objective keys reward_type: 'compress_first' # choice: acc_first | compress_first custom_reward: False # ratio of metric to accuracy in reward. If custom_reward if false,this value doesn't need to be configured. # Besides, if custom_reward is true, you can set metric_ratio to 0 so that the metric is not used in reward. # Otherwise, this value is recommended to be greater than 0.1. metric_to_reward: flops # metric to calculate reward metric_ratio: 0.5 stop_early: False acc_threshold: 0.5 latency_threshold: 5 compress_threshold: 40 search_space: type: SearchSpace hyperparameters: - key: network.bit_candidates type: CATEGORY range: [ 8, 32 ] trainer: type: QuantTrainer seed: 234 calib_portion: 0.001 callbacks: [OptExportCallback] custom_calib: type: calib_func #已注册的校正函数 file_config: *opt_config_file config: *model_config custom_eval: type: eval_func #已注册的精度评估函数 file_config: *opt_config_file config: *model_config metric_name: "CIDEr" evaluator: type: Evaluator device_evaluator: type: DeviceEvaluator custom: QuantCustomEvaluator om_input_shape: 'input_0:1,3,224,224' backend: mindspore delete_eval_model: False hardware: "Davinci" remote_host: "http://x.x.x.x:x" repeat_times: 1 muti_input: True
mxOps clibration -m taichu -c opt_caption_ms_quant.yml -d NPU
除mxOps工具外,可使用Vega启动任务,参考如下命令:
vega opt_caption_ms_quant.yml -d NPU