torch_npu.npu_quantize

功能描述

算子功能：对输入的张量进行量化处理。
计算公式：
- 如果div_mode为True：
  $\text{[math]}$
- 如果div_mode为False：
  $\text{[math]}$

接口原型

npu_quantize(Tensor input, Tensor scales, Tensor? zero_points, ScalarType dtype, int axis=1, bool div_mode=True) -> Tensor

参数说明

input：Device侧的Tensor类型，需要进行量化的源数据张量，必选输入，即公式中x。数据类型支持FLOAT、FLOAT16、BFLOAT16，数据格式支持ND，支持非连续的Tensor。

scales：Device侧的Tensor类型，对input进行scales的张量，必选输入：
- div_mode为True时，数据类型支持FLOAT、BFLOAT16。
- div_mode为False时，数据类型支持FLOAT、FLOAT16、BFLOAT16，数据格式支持ND，支持非连续的Tensor。
zero_points：Device侧的Tensor类型，对input进行offset的张量，可选输入：
- div_mode为True时，数据类型支持INT8、UINT8、INT32、BFLOAT16。
- div_mode为False时，数据类型支持FLOAT、FLOAT16、BFLOAT16，数据格式支持ND，支持非连续的Tensor。
dtype：指定Device侧输出Tensor的类型：
- div_mode为True时，格式支持torch.qint8、torch.quint8、torch.int32。
- div_mode为False时，格式支持torch.qint8。
axis：量化的elemwise轴，其他的轴做broadcast，默认值为1。
div_mode：div_mode为True时，表示用除法计算scales；div_mode为False时，表示用乘法计算scales，默认值为True。

输出说明

y：Device侧的aclTensor，公式中的输出，输出大小与input一致。

约束说明

axis只支持最后一维的elemwise。
BFLOAT16数据类型仅在Atlas A2 训练系列产品支持。

支持的PyTorch版本

PyTorch 2.3
PyTorch 2.2
PyTorch 2.1
PyTorch 1.11

支持的型号

Atlas A2 训练系列产品

Atlas 推理系列加速卡产品：div_mode为False时，支持该产品型号。

调用示例

import torch
import torch_npu
x = torch.randn(1, 1, 12).bfloat16().npu()
scale = torch.tensor([0.1] * 12).bfloat16().npu()
out = torch_npu.npu_quantize(x, scale, None, torch.qint8, -1, False)
print(out)

父主题： torch_npu