亲和库

表1 亲和函数对照表

序号

原生函数/参考链接

亲和函数名称

测试用例

局限性

1

self.dropout()/nn.functional.softmax()/torch.add

def fuse_add_softmax_dropout()

https://gitee.com/ascend/pytorch/blob/v1.8.1-3.0.0/test/test_contrib/test_fuse_add_softmax_dropout.py

暂无

2

def bboexs_diou()

def npu_diou()

https://gitee.com/ascend/pytorch/blob/v1.8.1-3.0.0/test/test_contrib/test_iou.py

反向仅支持trans==True, is_Cross==False,mode==0('iou')场景。

3

def bboexs_giou()

def npu_ciou()

https://gitee.com/ascend/pytorch/blob/v1.8.1-3.0.0/test/test_contrib/test_iou.py

反向仅支持trans==True, is_Cross==False,mode==0('iou')。

4

class FairseqDropout()

class NpuCachedDropout()

https://gitee.com/ascend/pytorch/blob/v1.8.1-3.0.0/test/test_contrib/test_ensemble_dropout.py

不支持动态shape。

5

class MultiheadAttention()

class MultiheadAttention()

https://gitee.com/ascend/pytorch/blob/v1.8.1-3.0.0/test/test_contrib/test_multihead_attention.py

不支持动态shape。

6

def single_level_responsible_flags()

def npu_single_level_responsible_flags()

https://gitee.com/ascend/pytorch/blob/v1.8.1-3.0.0/test/test_contrib/test_anchor_generator.py

由于NPU op的限制,output_size(featmap_size[0] * featmap_size[1] * num_base_anchors)必须小于60000。

7

def encode()

def npu_bbox_coder_encode_xyxy2xywh()

https://gitee.com/ascend/pytorch/blob/v1.8.1-3.0.0/test/test_contrib/test_bbox_coder.py

支持动态shape,因为语义的原因,只支持2维 (n,4)场景,max_shape必须传2个数,dtype仅支持f16和fp32,两个输入dtype需保持一致。

8

def decode()

def npu_bbox_coder_decode_yolo()

https://gitee.com/ascend/pytorch/blob/v1.8.1-3.0.0/test/test_contrib/test_bbox_coder.py

不支持动态shape,因为语义的原因,只支持2维 (n,4)场景,前两个输入仅支持同shape,同dtype,dtype仅支持f16和fp32,第三个输入仅支持1D并且第一个维度跟前两个输入保持一致。

9

无原函数,主要功能语句:input1[condition] = value,请查看测试用例。

def npu_fast_condition_index_put()

https://gitee.com/ascend/pytorch/blob/v1.8.1-3.0.0/test/test_contrib/test_index_op.py

暂无

10

torch.matmul()

class MatmulApply()

https://gitee.com/ascend/pytorch/blob/v1.8.1-3.0.0/test/test_contrib/test_matmul_transpose.py

动态shape场景下,不支持broadcast,仅支持输入fp16,输出fp16;输入fp16,输出fp32。

11

def multiclass_nms()

def npu_multiclass_nms()

https://gitee.com/ascend/pytorch/blob/v1.8.1-3.0.0/test/test_contrib/test_multiclass_nms.py

动态shape场景:类别最大为20,框的数量最大为10000。

12

def fast_nms()

def npu_batched_multiclass_nms()

https://gitee.com/ascend/pytorch/blob/v1.8.1-3.0.0/test/test_contrib/test_multiclass_nms.py

动态shape场景:类别最大为20,框的数量最大为10000。

13

torch.roll()

class NpuRollWithIndexSelect()

https://gitee.com/ascend/pytorch/blob/v1.8.1-3.0.0/test/test_contrib/test_roll.py

暂无

14

class Mish()

class Mish()

https://gitee.com/ascend/pytorch/blob/v1.8.1-3.0.0/test/test_contrib/test_activations.py

Mish从PyTorch 1.9.0开始在正式版本中存在。目前NPU适配的PyTorch版本为1.11.0,因此Mish需要定义为额外的模块。

15

class SiLu()

class SiLU()

https://gitee.com/ascend/pytorch/blob/v1.8.1-3.0.0/test/test_contrib/test_activations.py

SiLU从PyTorch 1.7.0开始在正式版本中存在。目前NPU适配的PyTorch版本为1.11.0,因此SiLU需要定义为额外的模块。

16

def channel_shuffle()

class ChannelShuffle()

https://gitee.com/ascend/pytorch/blob/v1.8.1-3.0.0/test/test_contrib/test_channel_shuffle.py

仅实现group=2场景。

17

class LabelSmoothingCrossEntropy()

class LabelSmoothingCrossEntropy()

https://gitee.com/ascend/pytorch/blob/v1.8.1-3.0.0/test/test_contrib/test_crossentropy.py

暂无

18

class ModulatedDeformConv2dFunciton()

class ModulatedDeformConv()

https://gitee.com/ascend/pytorch/blob/v1.8.1-3.0.0/test/test_contrib/test_deform_conv.py

ModedDeformConv仅支持fp32数据类型下的操作。注意,con_offset中的weight和bias必须初始化为0。

19

class DropPath()

class NpuDropPath()

https://gitee.com/ascend/pytorch/blob/v1.8.1-3.0.0/test/test_contrib/test_drop_path.py

不支持动态shape。

20

class Focus()

class Focus()

https://gitee.com/ascend/pytorch/blob/v1.8.1-3.0.0/test/test_contrib/test_focus.py

暂无

21

class PSROIPool()

class PSROIPool()

https://gitee.com/ascend/pytorch/blob/v1.8.1-3.0.0/test/test_contrib/test_ps_roi_pooling.py

pooled_height、pooled_width、group_size参数需保持一致。

22

class ROIAlign()

class ROIAlign()

https://gitee.com/ascend/pytorch/blob/v1.8.1-3.0.0/test/test_contrib/test_roi_align.py

需设置aligned=True,功能验证请参考test_roi_align.py

详细算子接口说明

以下亲和库适用于PyTorch 1.8.1版本。

def fuse_add_softmax_dropout(training, dropout, attn_mask, attn_scores, attn_head_size, p=0.5, dim=-1):

Use NPU custom operator to replace the native writing method to improve performance.

def npu_diou(boxes1,boxes2,trans=True, is_cross=False, mode=0):

Apply an NPU based DIOU operation.

Taking the distance between the targets,the overlap rate of the distance and the range into account. Different targets or boundaries will tend to be stable.

def npu_ciou(boxes1,boxes2,trans=True, is_cross=False, mode=0):

Apply an NPU based CIOU operation.

A penalty item is added on the basis of DIoU, and CIoU is proposed.

def npu_single_level_responsible_flags(featmap_size,gt_bboxes,stride,num_base_anchors):

Use NPU OP to generate the responsible flags of anchor in a single feature map.

def npu_bbox_coder_encode_yolo(bboxes, gt_bboxes, stride):

Use NPU OP to get box regression transformation deltas that can be used to transform the bboxes into the gt_bboxes.

def npu_bbox_coder_encode_xyxy2xywh(bboxes,gt_bboxes,means=None,stds=None,is_normalized=False,normalized_scale=10000.):

Apply an NPU based bboxes's format-encode operation from xyxy to xywh.

def npu_bbox_coder_decode_xywh2xyxy(bboxes,predbboxes,means=None,stds=None,maxshape=None,whratioclip=16 / 1000,):

Apply an NPU based bboxes's format-encode operation from xywh to xyxy.

def npu_fast_condition_index_put(x, condition, value):

Use NPU affinity writing method to replace the native writing method in bool type index_put function.

class MatmulApply(torch.autograd.Function):

Use NPU custom operator to replace the native writing method to improve performance.

def npu_multiclass_nms(multi_bboxes,multi_scores, score_thr=0.05,nms_thr=0.45,max_num=50,score_factors=None):

NMS for multi-class bboxes using npu api.

def npu_batched_multiclass_nms(multi_bboxes,multi_scores,max_num=50,score_factors=None):

NMS for batched multi-class bboxes using npu api.

def dropout_with_byte_mask(input1, p=0.5, training=True, inplace=False)

This dropout_with_byte_mask method generates stateless random uint8 mask and does dropout according to the mask.

class NpuRollWithIndexSelect():

Use NPU affinity writing method to replace the native roll in swin-transformer.

class Mish(nn.Module):

Apply an NPU based Mish operation.

class SiLU(nn.Module):

Apply an NPU based Sigmoid Linear Unit (SiLU) function, element-wise. The SiLU function is also known as the swish function.

class ChannelShuffle(nn.Module):

Apply an NPU compatible channel shuffle operation.In order to avoid contiguous operation which is not efficient on the npu, we replace the original operation with a rewrite of the same semantics. Two discontinuous operations are replaced: transpose and chunk.

class LabelSmoothingCrossEntropy(nn.Module):

CrossEntropy with LabelSmoothing using npu api.

class ModulatedDeformConv(nn.Module):

Apply an NPU-based Modulated Deformable 2D convolution operation.

class NpuDropPath(nn.Module):

Use NPU affinity writing method to replace the native Drop paths in swin_transformer.py. Drop paths (Stochastic Depth) per sample (when applied in main path of residual blocks.)

class NpuCachedDropout(torch.nn.Dropout):

FairseqDropout using on the npu device

class Focus(nn.Module):

Use NPU affinity writing method to replace the native Focus in Yolov5.

class FusedColorJitter(torch.nn.Module):

Randomly change the brightness, contrast, saturation and hue of an image.

class MultiheadAttention(nn.Module):

Multi-headed attention.

class DropoutWithByteMask(Module):

Apply an NPU compatible DropoutWithByteMask operation. Only support npu devices.

class PSROIPool(nn.Module):

ROIAlign using npu api.

class ROIAlign(nn.Module):

ROIAlign using npu api.