模型量化调优主要解决边侧推理设备及算力存在多样性、模型优化缺少硬件感知及自动优化过程的问题,根据边缘部署精度、算力、时延、内存不同约束目标,实现自动化的量化调优。
目前支持对包括但不限于表1中的模型进行基于强化学习的模型量化压缩。
类型 |
名称 |
框架 |
---|---|---|
图像分类 |
ResNet50 |
MindSpore、PyTorch |
MobileNetV2 |
MindSpore、PyTorch |
|
VIT |
MindSpore |
|
图像分割类 |
DeepLabV3 |
MindSpore |
目标检测类 |
FasterRCNN |
MindSpore |
YoloV5 |
MindSpore、PyTorch |
|
YoloV4 |
PyTorch |
|
YoloV3 |
MindSpore |
|
YoloV3-Tiny |
PyTorch |
|
SSD |
MindSpore、PyTorch |
|
RetinaNet |
PyTorch |
|
自然语言处理 |
BERT-Base |
MindSpore |
ERNIE |
MindSpore |
|
Transformer |
MindSpore |
请执行以下命令安装fvcore、onnx包。
pip3 install fvcore onnx --user
general: backend: pytorch # pytorch | tensorflow device_category: NPU task: local_base_path: /path/of/workdir #工作路径 task_id: "quant_resnet50_pytorch" device_evaluate_before_train: False pipeline: [nas] nas: pipe_step: type: SearchPipeStep model: model_desc: type: ResNet50_ModelZoo #该模型的定义方法需参照自定义模型注册 version: resnet50 config: classic input_shape: [ 1, 3, 224, 224 ] dataset: type: Imagenet common: data_path: /path/of/dataset #数据集路径 drop_last: False shuffer: False … search_algorithm: … #choice: acc_first | compress_first reward_type: 'acc_first' #模型量化压缩优先指标。acc_first:精度优先,compress_first:压缩率优先 custom_reward: False latency_acc_ratio: 0.5 # ratio of latency to accuracy in reward. # If custom_reward if false,this value doesn't need to be configured. # Besides, if custom_reward is true,you can set latency_acc_ratio to 0 so that latency is not used in reward. # Otherwise,this value is recommended to be greater than 0.1. stop_early: False #是否找到同时满足acc_threshold、latency_threshold、compress_threshold三项指标即停止任务 acc_threshold: 0.5 #精度损失阈值百分比 latency_threshold: 5 #时延降低百分比 compress_threshold: 40 #压缩率阈值百分比 search_space: type: SearchSpace hyperparameters: - key: network.bit_candidates type: CATEGORY range: [8, 32] trainer: type: Trainer epochs: 1 seed: 234 callbacks: [QuantPTQCallback, ConvertOnnxCallback] pretrained_model_file: /path/of/pretrain/model/file #ResNet50预训练权重文件 … evaluator: type: Evaluator device_evaluator: type: DeviceEvaluator custom: QuantCustomEvaluator remote_host: http://xxx.xxx.xxx.xxx:port/ #远端推理服务器URL,后四位为端口号。如果在推理服务器中执行“vega-config -q sec”的返回值为“True”,请将“http”更改为“https” backend: 'pytorch' om_input_shape: '1,3,224,224' delete_eval_model: True #是否删除搜索出的量化模型
vega resnet_rl_quant.yml -d NPU
任务结束后会在指定工作路径的log文件夹中输出搜索日志。如果在评估服务evaluator中,device_evaluator的delete_eval_model字段配置成“False”,将在指定的工作路径的output/nas文件夹中输出每个搜索结果对应的模型。
如果想对参考基于训练脚本的剪枝调优进行剪枝调优后的模型做模型量化调优,无需进行模型自定义注册和模型脚本,而需要提供模型描述文件(.json),具体可参考{CANN包安装路径}/ascend-toolkit/latest/tools/ascend_automl/example/pytorch/quant/classification/resnet_prune_rl_quant.yml进行修改。
general: backend: mindspore # pytorch | mindspore device_category: NPU task: local_base_path: ./tasks/ task_id: "quant_yolov5" device_evaluate_before_train: False parallel_search: True logger: level: info worker: timeout: 720000000 pipeline: [nas] nas: pipe_step: type: SearchPipeStep model: model_desc: type: PruneModel model_file_path: /home/examples/yolov5/src/yolo.py #模型脚本,该脚本内需提供一个get_model方法返回需要量化的模型实例 pkg_path: /home/examples/yolov5 #用户的训练脚本所在的目录 pretrained_model_file: "/home/examples/yolov5/pre_train/0-300_274800.1130.ckpt" input_shape: - type: fp32 tensor: True shape: [ 1,12,320,320 ] - type: int32 tensor: False shape: [ 640,640 ] search_algorithm: type: MsQuantRL codec: QuantRLCodec policy: max_episode: 30 # Max eposide, recommended value>100, bigger is better, but it takes longer to learn. # If this value cannot be determined, please set this value to a large value, and then / # set a quantitative target and stop learning early to avoid the setting of this value. num_warmup: 10 # time without training but only filling the replay memory, recommended:10-20 objective_keys: [ 'accuracy','compress_ratio','latency' ] #choice: acc_first | compress_first reward_type: 'compress_first' #模型量化压缩优先指标。acc_first:精度优先;compress_first:压缩率优先 custom_reward: False latency_acc_ratio: 0.5 # ratio of latency to accuracy in reward. # If custom_reward if false,this value doesn't need to be configured. # Besides, if custom_reward is true,you can set latency_acc_ratio to 0 so that latency is not used in reward. # Otherwise,this value is recommended to be greater than 0.1. stop_early: False #是否找到同时满足acc_threshold、latency_threshold、compress_threshold三项指标即停止任务 acc_threshold: 0.5 #精度损失阈值百分比 latency_threshold: 5 #时延降低百分比 compress_threshold: 40 #压缩率阈值百分比 search_space: type: SearchSpace hyperparameters: - key: network.bit_candidates type: CATEGORY range: [ 8, 32] trainer: type: OriTrainer seed: 234 callbacks: [MsQuantPTQCallback, CustomMetricCallback, CustomExportCallback] calib_portion: 0.01 custom_calib: #前向校正,校正量化因子 pkg_path: /home/examples/yolov5/ #calib_func接口所在包路径 path: /home/examples/yolov5/train.py #接口所在路径 func: calib_func #对检测类模型,需要提供校正接口,校正方法可参考该{CANN包安装路径}/ascend-toolkit/latest/tools/ascend_automl/example/pytorch/quant/classification/resnet_custom_func.py脚本 custom_eval: #评估精度 pkg_path: /home/examples/yolov5/ #eval_func接口所在包路径 path: /home/examples/yolov5/eval.py #接口所在路径 func: run_eval #对检测类模型,需要提供验证接口,验证方法可参考{CANN包安装路径}/ascend-toolkit/latest/tools/ascend_automl/example/pytorch/quant/classification/resnet_custom_func.py脚本 metric_name: "mAP" #需要优化的指标 evaluator: type: Evaluator device_evaluator: type: DeviceEvaluator custom: QuantCustomEvaluator om_input_shape: 'input_0:1,12,320,320' backend: mindspore delete_eval_model: False #是否删除搜索出的量化模型 hardware: "Davinci" remote_host: "http://x.x.x.x:xxxx" #远端推理服务器URL,后四位为端口号。如果在推理服务器中执行“vega-config -q sec”的返回值为“True”,请将“http”更改为“https” repeat_times: 1 muti_input: True save_intermediate_file: True
对于检测类模型需要提供一个校正接口,需要利用训练数据集做前向计算,校正量化参数。不需要做反向计算,反向传播相关的代码可以去掉,比如学习率,loss,优化器,校正接口具体配置可参考OriTrainer的custom_calib字段。
对于检测类模型需要提供一个评估接口,计算模型精度指标,需要返回dict类型的评估结果,评估接口具体配置可参考OriTrainer的custom_eval字段。其中,metric_name是返回的评估结果中的一个key值。
vega yolov5_ms_quant.yml -d NPU
任务结束后会在指定工作路径的log文件夹中输出搜索日志。如果在评估服务evaluator中,device_evaluator的delete_eval_model字段配置成“False”,将在指定的工作路径的output/nas文件夹中输出每个搜索结果对应的模型。
如果想对参考基于训练脚本的剪枝调优进行剪枝调优后的模型做模型量化调优,无需提供模型脚本,提供模型描述文件(.json)即可,具体可参考{CANN包安装路径}/ascend-toolkit/latest/tools/ascend_automl/examples/mindspore/quant/detection/yolov5/yolov5_prune_ms_quant.yml进行修改。
general: backend: mindspore device_category: NPU device_evaluate_before_train: False task: local_base_path: ./tasks/ #工作路径 task_id: "ernie_chnsenticorp_quant" logger: level: info worker: timeout: 720000000 pipeline: [nas] register: pkg_path: [ "/xxx/ERNIE_for_MindSpore_1.6_code" ] #ERNIE源码路径 modules: - module: "run_ernie_classifier" #模块导入 script_network: "get_model" #获取模型实例的函数 script_network_input: ["get_input"] #获取模型输入的函数 ori_correct_func: ["calib_func"] #量化参数校正的函数 ori_eval_func: ["eval_func"] #量化后模型精度评估的函数 nas: pipe_step: type: SearchPipeStep model: model_desc: type: ScriptModelGen common: network: type: get_model config: checkpoint_path: "/xxx/chnsenticorp-0-10_400.ckpt" #模型权重文件路径 multiple_inputs: type: get_input config: eval_batch_size: 1 eval_data_file_path: "/xxx/chnsenticorp_test.mindrecord" #推理数据集路径 search_algorithm: type: MsQuantRL codec: QuantRLCodec policy: max_episode: 30 # Max eposide, recommended value>100, bigger is better, but it takes longer to learn. # If this value cannot be determined, please set this value to a large value, and then / # set a quantitative target and stop learning early to avoid the setting of this value. num_warmup: 10 # time without training but only filling the replay memory, recommended:10-20 objective_keys: [ 'accuracy','compress_ratio','latency' ] #choice: acc_first | compress_first reward_type: 'compress_first' custom_reward: False latency_acc_ratio: 0.5 # ratio of latency to accuracy in reward. # If custom_reward if false,this value doesn't need to be configured. # Besides, if custom_reward is true,you can set latency_acc_ratio to 0 so that latency is not used in reward. # Otherwise,this value is recommended to be greater than 0.1. stop_early: False acc_threshold: 0.5 latency_threshold: 5 compress_threshold: 40 search_space: type: SearchSpace hyperparameters: - key: network.bit_candidates type: CATEGORY range: [8, 32] trainer: type: QuantTrainer seed: 234 calib_portion: 0.1 callbacks: [OptExportCallback] custom_calib: type: calib_func#已注册的校正函数 config: train_batch_size: 32 train_data_file_path: "/xxx/chnsenticorp_train.mindrecord" #训练数据集路径 custom_eval: type: eval_func #已注册的评估函数 metric_name: "accuracy" config: eval_batch_size: 1 eval_data_file_path: "/xxx/chnsenticorp_test.mindrecord" #推理数据集路径 evaluator: type: Evaluator device_evaluator: type: DeviceEvaluator custom: QuantCustomEvaluator backend: mindspore delete_eval_model: True hardware: "Davinci" remote_host: "http://xx.xx.xx.xx:xxxx" #远端推理服务器URL,后四位为端口号。如果在推理服务器中执行“vega-config -q sec”的返回值为“True”,请将“http”更改为“https” repeat_times: 1 muti_input: True
vega ernie_chnsenticorp_quant.yml -d NPU
任务结束后会在指定工作路径的log文件夹中输出搜索日志。在评估服务evaluator中,如果在device_evaluator中配置“save_intermediate_file: True”,将在指定的工作路径的output/nas文件夹中输出每个搜索结果对应的模型。