quantize

产品支持情况

产品	是否支持
Atlas 350 加速卡	√
Atlas A3 训练系列产品/Atlas A3 推理系列产品	√
Atlas A2 训练系列产品/Atlas A2 推理系列产品	√
Atlas 200I/500 A2 推理产品	x
Atlas 推理系列产品	x
Atlas 训练系列产品	x

功能说明

基于torch module的训练后量化接口，高精度模型转换为校准模型，得到量化校准模型，推理后计算得到量化参数。

函数原型

quantize(model, config)

参数说明

参数名	输入/输出	说明
model	输入/输出	输入含义：待量化的高精度模型输出含义：量化校准模型数据类型：torch.nn.Module
config	输入	含义：量化配置。数据类型：自定义dict，其中包含weight/input/algorithm/skip_layers的配置，详细配置参数请参见config详细配置。

参数名

输入/输出

说明

model

输入/输出

输入含义：待量化的高精度模型

输出含义：量化校准模型

数据类型：torch.nn.Module

config

输入

含义：量化配置。

数据类型：自定义dict，其中包含weight/input/algorithm/skip_layers的配置，详细配置参数请参见config详细配置。

返回值说明

无

调用示例

# 建立待进行量化的网络图结构
ori_model = build_model()
model = copy.deepcopy(ori_model)
# 量化配置
cfg = {
        'batch_num': 1,
        'quant_cfg': {
            'weights': {
                'type': 'int8',
                'symmetric': True,
                'strategy': 'tensor',
            },
        },
        'algorithm': {'minmax'},
        }
# 调用量化接口生成量化校准模型
quantize(model, cfg)

config详细配置

key	-	-	value
batch_num	-	-	uint32类型，量化使用的batch数量。
quant_cfg	-	-	量化配置。
-	weights	-	仅权重量化配置。
-	-	type	string类型，权重（weight）量化类型。当前支持如下类型： hifloat8 float8_e4m3fn mxfp4_e2m1 float4_e2m1 int4 int8
-	-	symmetric	bool类型，权重是否为对称量化。 True：对称量化。 False：非对称量化。
-	-	strategy	string类型，权重量化粒度。 tensor，对应per-tensor。 channel，对应per-channel。 group，对应per-group，该参数只支持量化数据类型为MXFP8_E4M3FN时配置。量化粒度介绍请参见压缩概念。
-	-	group_size	仅权重量化场景配置，per-group量化粒度下group的大小，该参数只有配置了per-group后，才能配置。要求传入值的范围为[32, K-1]且必须是32的倍数。
-	inputs	-	数据量化配置。
-	-	type	string类型，数据（activation）量化类型。目前支持如下类型： hifloat8 float8_e4m3fn mxfp8_e4m3fn int8
-	-	symmetric	bool类型，数据是否为对称量化。 True：对称量化。 False：非对称量化。
-	-	strategy	string类型，数据量化粒度。 tensor，对应per-tensor。 token，对应per-token。
algorithm	-	-	string类型，量化算法，支持如下配置： awq：grids_num，uint32类型，搜索格点数量，默认为20。 gptq。 minmax。 smoothquant：smooth_strength，float类型，迁移强度，默认值0.5。 ofmr。 mxquant：仅做mx数据类型转换。具体请参见压缩算法。
skip_layers	-	-	string类型，按层名跳过哪些层不做量化，全局配置参数。指定层名后，只要层名包括用户设置的字符串，就跳过该层不做量化。

父主题： 基于torch module的量化接口