序号 |
算子名称 |
---|---|
1 |
torch_npu._npu_dropout |
2 |
torch_npu._npu_dropout_inplace |
3 |
torch_npu.copy_memory_ |
4 |
torch_npu.empty_with_format |
5 |
torch_npu.fast_gelu |
6 |
torch_npu.npu_alloc_float_status |
7 |
torch_npu.npu_anchor_response_flags |
8 |
torch_npu.npu_apply_adam |
9 |
torch_npu.npu_batch_nms |
10 |
torch_npu.npu_bert_apply_adam |
11 |
torch_npu.npu_bmmV2 |
12 |
torch_npu.npu_bounding_box_decode |
13 |
torch_npu.npu_bounding_box_encode |
14 |
torch_npu.npu_broadcast |
15 |
torch_npu.npu_ciou |
16 |
torch_npu.npu_clear_float_status |
17 |
torch_npu.npu_confusion_transpose |
18 |
torch_npu.npu_conv_transpose2d |
19 |
torch_npu.npu_conv2d |
20 |
torch_npu.npu_conv3d |
21 |
torch_npu.npu_convolution |
22 |
torch_npu.npu_convolution_transpose |
23 |
torch_npu.npu_deformable_conv2d |
24 |
torch_npu.npu_diou |
25 |
torch_npu.npu_dropoutV2 |
26 |
torch_npu.npu_dtype_cast |
27 |
torch_npu.npu_format_cast |
28 |
torch_npu.npu_format_cast_ |
29 |
torch_npu.npu_get_float_status |
30 |
torch_npu.npu_giou |
31 |
torch_npu.npu_grid_assign_positive |
32 |
torch_npu.npu_gru |
33 |
torch_npu.npu_ifmr |
34 |
torch_npu.npu_indexing |
35 |
torch_npu.npu_iou |
36 |
torch_npu.npu_layer_norm_eval |
37 |
torch_npu.npu_linear |
38 |
torch_npu.npu_lstm |
39 |
torch_npu.npu_masked_fill_range |
40 |
torch_npu.npu_max |
41 |
torch_npu.npu_mish |
42 |
torch_npu.npu_nms_v4 |
43 |
torch_npu.npu_nms_with_mask |
44 |
torch_npu.npu_normalize_batch |
45 |
torch_npu.npu_one_hot |
46 |
torch_npu.npu_pad |
47 |
torch_npu.npu_ps_roi_pooling |
48 |
torch_npu.npu_ptiou |
49 |
torch_npu.npu_random_choice_with_mask |
50 |
torch_npu.npu_roi_align |
51 |
torch_npu.npu_scatter |
52 |
torch_npu.npu_sign_bits_pack |
53 |
torch_npu.npu_sign_bits_unpack |
54 |
torch_npu.npu_slice |
55 |
torch_npu.npu_softmax_cross_entropy_with_logits |
56 |
torch_npu.npu_sort_v2 |
57 |
torch_npu.npu_stride_add |
58 |
torch_npu.npu_transpose |
59 |
torch_npu.npu_yolo_boxes_encode |
60 |
torch_npu.one_ |
NPU自定义算子参数中存在部分映射关系可参考下表。
参数 |
映射参数 |
说明 |
---|---|---|
ACL_FORMAT_UNDEFINED |
-1 |
Format参数映射值。 |
ACL_FORMAT_NCHW |
0 |
|
ACL_FORMAT_NHWC |
1 |
|
ACL_FORMAT_ND |
2 |
|
ACL_FORMAT_NC1HWC0 |
3 |
|
ACL_FORMAT_FRACTAL_Z |
4 |
|
ACL_FORMAT_NC1HWC0_C04 |
12 |
|
ACL_FORMAT_HWCN |
16 |
|
ACL_FORMAT_NDHWC |
27 |
|
ACL_FORMAT_FRACTAL_NZ |
29 |
|
ACL_FORMAT_NCDHW |
30 |
|
ACL_FORMAT_NDC1HWC0 |
32 |
|
ACL_FRACTAL_Z_3D |
33 |
torch_npu.npu_apply_adam(beta1_power, beta2_power, lr, beta1, beta2, epsilon, grad, use_locking, use_nesterov, out = (var, m, v))
Count adam result.
torch_npu.npu_convolution_transpose(input, weight, bias, padding, output_padding, stride, dilation, groups) -> Tensor
Apply a 2D or 3D transposed convolution operator over an input image composed of several input planes, sometimes also called “deconvolution”.
torch_npu.npu_conv_transpose2d(input, weight, bias, padding, output_padding, stride, dilation, groups) -> Tensor
Apply a 2D transposed convolution operator over an input image composed of several input planes, sometimes also called “deconvolution”.
torch_npu.npu_convolution(input, weight, bias, stride, padding, dilation, groups) -> Tensor
Apply a 2D or 3D convolution over an input image composed of several input planes.
torch_npu.npu_conv2d(input, weight, bias, stride, padding, dilation, groups) -> Tensor
Apply a 2D convolution over an input image composed of several input planes.
torch_npu.npu_conv3d(input, weight, bias, stride, padding, dilation, groups) -> Tensor
Apply a 3D convolution over an input image composed of several input planes.
torch_npu.one_(self) -> Tensor
Fills self tensor with 1. .
>>> x = torch.rand(2, 3).npu() >>> xtensor([[0.6072, 0.9726, 0.3475], [0.3717, 0.6135, 0.6788]], device='npu:0') >>> x.one_()tensor([[1., 1., 1.], [1., 1., 1.]], device='npu:0')
torch_npu.npu_sort_v2(self, dim=-1, descending=False, out=None) -> Tensor
Sort the elements of the input tensor along a given dimension in ascending order by values without indices. If dim is not given, the last dimension of the input is chosen. If descending is True then the elements are sorted in descending order by value.
>>> x = torch.randn(3, 4).npu() >>> x tensor([[-0.0067, 1.7790, 0.5031, -1.7217], [ 1.1685, -1.0486, -0.2938, 1.3241], [ 0.1880, -2.7447, 1.3976, 0.7380]], device='npu:0') >>> sorted_x = torch_npu.npu_sort_v2(x) >>> sorted_x tensor([[-1.7217, -0.0067, 0.5029, 1.7793], [-1.0488, -0.2937, 1.1689, 1.3242], [-2.7441, 0.1880, 0.7378, 1.3975]], device='npu:0')
torch_npu.npu_format_cast(self, acl_format) -> Tensor
Change the format of an npu tensor.
>>> x = torch.rand(2, 3, 4, 5).npu() >>> torch_npu.get_npu_format(x) 0 >>> x1 = x.npu_format_cast(29) >>> torch_npu.get_npu_format(x1) 29
torch_npu.npu_format_cast_(self, src) -> Tensor
In-place change the format of self, with the same format as src.
>>> x = torch.rand(2, 3, 4, 5).npu() >>> torch_npu.get_npu_format(x) 0 >>> torch_npu.get_npu_format(x.npu_format_cast_(29)) 29
torch_npu.npu_transpose(self, perm, require_contiguous) -> Tensor
Return a view of the original tensor with its dimensions permuted, and make the result contiguous.
>>> x = torch.randn(2, 3, 5).npu() >>> x.shape torch.Size([2, 3, 5]) >>> x1 = torch_npu.npu_transpose(x, (2, 0, 1)) >>> x1.shape torch.Size([5, 2, 3]) >>> x2 = x.npu_transpose(2, 0, 1) >>> x2.shape torch.Size([5, 2, 3])
torch_npu.npu_broadcast(self, size) -> Tensor
Return a new view of the self tensor with singleton dimensions expanded to a larger size, and make the result contiguous.
torch_npu.npu_dtype_cast(input, dtype) -> Tensor
torch_npu.empty_with_format(size, dtype, layout, device, pin_memory, acl_format) -> Tensor
torch_npu.copy_memory_(dst, src, non_blocking=False) -> Tensor
copy_memory_ only supports npu tensor. Input tensors of copy_memory_ should have the same dtype and device index.
>>> a=torch.IntTensor([0, 0, -1]).npu() >>> b=torch.IntTensor([1, 1, 1]).npu() >>> a.copy_memory_(b) tensor([1, 1, 1], device='npu:0', dtype=torch.int32)
torch_npu.npu_one_hot(input, num_classes=-1, depth=1, on_value=1, off_value=0) -> Tensor
>>> a=torch.IntTensor([5, 3, 2, 1]).npu() >>> b=torch_npu.npu_one_hot(a, depth=5) >>> btensor([[0., 0., 0., 0., 0.], [0., 0., 0., 1., 0.], [0., 0., 1., 0., 0.], [0., 1., 0., 0., 0.]], device='npu:0')
torch_npu.npu_stride_add(x1, x2, offset1, offset2, c1_len) -> Tensor
>>> a=torch.tensor([[[[[1.]]]]]).npu() >>> b=torch_npu.npu_stride_add(a, a, 0, 0, 1) >>> btensor([[[[[2.]]], [[[0.]]], [[[0.]]], [[[0.]]], [[[0.]]], [[[0.]]], [[[0.]]], [[[0.]]], [[[0.]]], [[[0.]]], [[[0.]]], [[[0.]]], [[[0.]]], [[[0.]]], [[[0.]]], [[[0.]]]]], device='npu:0')
torch_npu.npu_softmax_cross_entropy_with_logits(features, labels) -> Tensor
Compute softmax cross entropy cost.
torch_npu.npu_ps_roi_pooling(x, rois, spatial_scale, group_size, output_dim) -> Tensor
>>> roi = torch.tensor([[[1], [2], [3], [4], [5]], [[6], [7], [8], [9], [10]]], dtype = torch.float16).npu() >>> x = torch.tensor([[[[ 1]], [[ 2]], [[ 3]], [[ 4]], [[ 5]], [[ 6]], [[ 7]], [[ 8]]], [[[ 9]], [[10]], [[11]], [[12]], [[13]], [[14]], [[15]], [[16]]]], dtype = torch.float16).npu() >>> out = torch_npu.npu_ps_roi_pooling(x, roi, 0.5, 2, 2) >>> outtensor([[[[0., 0.], [0., 0.]], [[0., 0.], [0., 0.]]], [[[0., 0.], [0., 0.]], [[0., 0.], [0., 0.]]]], device='npu:0', dtype=torch.float16)
torch_npu.npu_roi_align(features, rois, spatial_scale, pooled_height, pooled_width, sample_num, roi_end_mode) -> Tensor
Obtain the ROI feature matrix from the feature map. It is a customized FasterRcnn operator.
>>> x = torch.FloatTensor([[[[1, 2, 3 , 4, 5, 6], [7, 8, 9, 10, 11, 12], [13, 14, 15, 16, 17, 18], [19, 20, 21, 22, 23, 24], [25, 26, 27, 28, 29, 30], [31, 32, 33, 34, 35, 36]]]]).npu() >>> rois = torch.tensor([[0, -2.0, -2.0, 22.0, 22.0]]).npu() >>> out = torch_npu.npu_roi_align(x, rois, 0.25, 3, 3, 2, 0) >>> out tensor([[[[ 4.5000, 6.5000, 8.5000], [16.5000, 18.5000, 20.5000], [28.5000, 30.5000, 32.5000]]]], device='npu:0')
torch_npu.npu_nms_v4(boxes, scores, max_output_size, iou_threshold, scores_threshold, pad_to_max_output_size=False) -> (Tensor, Tensor)
>>> boxes=torch.randn(100,4).npu() >>> scores=torch.randn(100).npu() >>> boxes.uniform_(0,100) >>> scores.uniform_(0,1) >>> max_output_size = 20 >>> iou_threshold = torch.tensor(0.5).npu() >>> scores_threshold = torch.tensor(0.3).npu() >>> npu_output = torch_npu.npu_nms_v4(boxes, scores, max_output_size, iou_threshold, scores_threshold) >>> npu_output (tensor([57, 65, 25, 45, 43, 12, 52, 91, 23, 78, 53, 11, 24, 62, 22, 67, 9, 94, 54, 92], device='npu:0', dtype=torch.int32), tensor(20, device='npu:0', dtype=torch.int32))
torch_npu.npu_nms_rotated(dets, scores, iou_threshold, scores_threshold=0, max_output_size=-1, mode=0) -> (Tensor, Tensor)
>>> dets=torch.randn(100,5).npu() >>> scores=torch.randn(100).npu() >>> dets.uniform_(0,100) >>> scores.uniform_(0,1) >>> output1, output2 = torch_npu.npu_nms_rotated(dets, scores, 0.2, 0, -1, 1) >>> output1 tensor([76, 48, 15, 65, 91, 82, 21, 96, 62, 90, 13, 59, 0, 18, 47, 23, 8, 56, 55, 63, 72, 39, 97, 81, 16, 38, 17, 25, 74, 33, 79, 44, 36, 88, 83, 37, 64, 45, 54, 41, 22, 28, 98, 40, 30, 20, 1, 86, 69, 57, 43, 9, 42, 27, 71, 46, 19, 26, 78, 66, 3, 52], device='npu:0', dtype=torch.int32) >>> output2tensor([62], device='npu:0', dtype=torch.int32)
torch_npu.npu_lstm(x, weight, bias, seqMask, h, c, has_biases, num_layers, dropout, train, bidirectional, batch_first, flag_seq, direction)
DynamicRNN calculation.
torch_npu.npu_iou(bboxes, gtboxes, mode=0) -> Tensor torch_npu.npu_ptiou(bboxes, gtboxes, mode=0) -> Tensor
>>> bboxes = torch.tensor([[0, 0, 10, 10], [10, 10, 20, 20], [32, 32, 38, 42]], dtype=torch.float16).to("npu") >>> gtboxes = torch.tensor([[0, 0, 10, 20], [0, 10, 10, 10], [10, 10, 20, 20]], dtype=torch.float16).to("npu") >>> output_iou = torch_npu.npu_iou(bboxes, gtboxes, 0) >>> output_iou tensor([[0.4985, 0.0000, 0.0000], [0.0000, 0.0000, 0.0000], [0.0000, 0.9961, 0.0000]], device='npu:0', dtype=torch.float16)
torch_npu.npu_pad(input, paddings) -> Tensor
>>> input = torch.tensor([[20, 20, 10, 10]], dtype=torch.float16).to("npu") >>> paddings = [1, 1, 1, 1] >>> output = torch_npu.npu_pad(input, paddings) >>> output tensor([[ 0., 0., 0., 0., 0., 0.], [ 0., 20., 20., 10., 10., 0.], [ 0., 0., 0., 0., 0., 0.]], device='npu:0', dtype=torch.float16)
torch_npu.npu_nms_with_mask(input, iou_threshold) -> (Tensor, Tensor, Tensor)
>>> input = torch.tensor([[0.0, 1.0, 2.0, 3.0, 0.6], [6.0, 7.0, 8.0, 9.0, 0.4]], dtype=torch.float16).to("npu") >>> iou_threshold = 0.5 >>> output1, output2, output3, = torch_npu.npu_nms_with_mask(input, iou_threshold) >>> output1 tensor([[0.0000, 1.0000, 2.0000, 3.0000, 0.6001], [6.0000, 7.0000, 8.0000, 9.0000, 0.3999]], device='npu:0', dtype=torch.float16) >>> output2 tensor([0, 1], device='npu:0', dtype=torch.int32) >>> output3 tensor([1, 1], device='npu:0', dtype=torch.uint8)
torch_npu.npu_bounding_box_encode(anchor_box, ground_truth_box, means0, means1, means2, means3, stds0, stds1, stds2, stds3) -> Tensor
>>> anchor_box = torch.tensor([[1., 2., 3., 4.], [3.,4., 5., 6.]], dtype = torch.float32).to("npu") >>> ground_truth_box = torch.tensor([[5., 6., 7., 8.], [7.,8., 9., 6.]], dtype = torch.float32).to("npu") >>> output = torch_npu.npu_bounding_box_encode(anchor_box, ground_truth_box, 0, 0, 0, 0, 0.1, 0.1, 0.2, 0.2) >>> outputtensor([[13.3281, 13.3281, 0.0000, 0.0000], [13.3281, 6.6641, 0.0000, -5.4922]], device='npu:0')
torch_npu.npu_bounding_box_decode(rois, deltas, means0, means1, means2, means3, stds0, stds1, stds2, stds3, max_shape, wh_ratio_clip) -> Tensor
>>> rois = torch.tensor([[1., 2., 3., 4.], [3.,4., 5., 6.]], dtype = torch.float32).to("npu") >>> deltas = torch.tensor([[5., 6., 7., 8.], [7.,8., 9., 6.]], dtype = torch.float32).to("npu") >>> output = torch_npu.npu_bounding_box_decode(rois, deltas, 0, 0, 0, 0, 1, 1, 1, 1, (10, 10), 0.1) >>> output tensor([[2.5000, 6.5000, 9.0000, 9.0000], [9.0000, 9.0000, 9.0000, 9.0000]], device='npu:0')
torch_npu.npu_gru(input, hx, weight_input, weight_hidden, bias_input, bias_hidden, seq_length, has_biases, num_layers, dropout, train, bidirectional, batch_first) -> (Tensor, Tensor, Tensor, Tensor, Tensor, Tensor)
DynamicGRUV2 calculation.
torch_npu.npu_random_choice_with_mask(x, count=256, seed=0, seed2=0) -> (Tensor, Tensor)
>>> x = torch.tensor([1, 0, 1, 0], dtype=torch.bool).to("npu") >>> result, mask = torch_npu.npu_random_choice_with_mask(x, 2, 1, 0) >>> resulttensor([[0], [2]], device='npu:0', dtype=torch.int32) >>> mask tensor([True, True], device='npu:0')
torch_npu.npu_batch_nms(self, scores, score_threshold, iou_threshold, max_size_per_class, max_total_size, change_coordinate_frame=False, transpose_box=False) -> (Tensor, Tensor, Tensor, Tensor)
>>> boxes = torch.randn(8, 2, 4, 4, dtype = torch.float32).to("npu") >>> scores = torch.randn(3, 2, 4, dtype = torch.float32).to("npu") >>> nmsed_boxes, nmsed_scores, nmsed_classes, nmsed_num = torch_npu.npu_batch_nms(boxes, scores, 0.3, 0.5, 3, 4) >>> nmsed_boxes >>> nmsed_scores >>> nmsed_classes >>> nmsed_num
torch_npu.npu_slice(self, offsets, size) -> Tensor
Extract a slice from a tensor.
>>> input = torch.tensor([[1,2,3,4,5], [6,7,8,9,10]], dtype=torch.float16).to("npu") >>> offsets = [0, 0]>>> size = [2, 2] >>> output = torch_npu.npu_slice(input, offsets, size) >>> output tensor([[1., 2.], [6., 7.]], device='npu:0', dtype=torch.float16)
torch_npu.npu_dropoutV2(self, seed, p) -> (Tensor, Tensor, Tensor(a!))
Count dropout result with seed.
>>> input = torch.tensor([1.,2.,3.,4.]).npu() >>> input tensor([1., 2., 3., 4.], device='npu:0') >>> seed = torch.rand((32,),dtype=torch.float32).npu() >>> seed tensor([0.4368, 0.7351, 0.8459, 0.4657, 0.6783, 0.8914, 0.8995, 0.4401, 0.4408, 0.4453, 0.2404, 0.9680, 0.0999, 0.8665, 0.2993, 0.5787, 0.0251, 0.6783, 0.7411, 0.0670, 0.9430, 0.9165, 0.3983, 0.5849, 0.7722, 0.4659, 0.0486, 0.2693, 0.6451, 0.2734, 0.3176, 0.0176], device='npu:0') >>> prob = 0.3 >>> output, mask, out_seed = torch_npu.npu_dropoutV2(input, seed, prob) >>> output tensor([0.4408, 0.4453, 0.2404, 0.9680], device='npu:0') >>> mask tensor([0., 0., 0., 0.], device='npu:0') >>> out_seed tensor([0.4408, 0.4453, 0.2404, 0.9680, 0.0999, 0.8665, 0.2993, 0.5787, 0.0251, 0.6783, 0.7411, 0.0670, 0.9430, 0.9165, 0.3983, 0.5849, 0.7722, 0.4659, 0.0486, 0.2693, 0.6451, 0.2734, 0.3176, 0.0176, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000], device='npu:0')
torch_npu._npu_dropout(self, p) -> (Tensor, Tensor)
Count dropout result without seed.
>>> input = torch.tensor([1.,2.,3.,4.]).npu() >>> input tensor([1., 2., 3., 4.], device='npu:0') >>> prob = 0.3>>> output, mask = torch_npu._npu_dropout(input, prob) >>> output tensor([0.0000, 2.8571, 0.0000, 0.0000], device='npu:0') >>> mask tensor([ 98, 255, 188, 186, 120, 157, 175, 159, 77, 223, 127, 79, 247, 151, 253, 255], device='npu:0', dtype=torch.uint8)
torch_npu._npu_dropout_inplace(result, p) -> (Tensor(a!), Tensor)
>>> input = torch.tensor([1.,2.,3.,4.]).npu() >>> input tensor([1., 2., 3., 4.], device='npu:0') >>> prob = 0.3>>> output, mask = torch_npu._npu_dropout_inplace(input, prob) >>> output tensor([0.0000, 2.8571, 0.0000, 0.0000], device='npu:0') >>> inputtensor([0.0000, 2.8571, 4.2857, 5.7143], device='npu:0') >>> mask tensor([ 98, 255, 188, 186, 120, 157, 175, 159, 77, 223, 127, 79, 247, 151, 253, 255], device='npu:0', dtype=torch.uint8)
torch_npu.npu_indexing(self, begin, end, strides, begin_mask=0, end_mask=0, ellipsis_mask=0, new_axis_mask=0, shrink_axis_mask=0) -> Tensor
>>> input = torch.tensor([[1, 2, 3, 4],[5, 6, 7, 8]], dtype=torch.int32).to("npu") >>> input tensor([[1, 2, 3, 4], [5, 6, 7, 8]], device='npu:0', dtype=torch.int32) >>> output = torch_npu.npu_indexing(input1, [0, 0], [2, 2], [1, 1]) >>> output tensor([[1, 2], [5, 6]], device='npu:0', dtype=torch.int32)
torch_npu.npu_ifmr(Tensor data, Tensor data_min, Tensor data_max, Tensor cumsum, float min_percentile, float max_percentile, float search_start, float search_end, float search_step, bool with_offset) -> (Tensor, Tensor)
Count ifmr result by begin,end,strides array, Input Feature Map Reconstruction.
>>> input = torch.rand((2,2,3,4),dtype=torch.float32).npu() >>> input tensor([[[[0.4508, 0.6513, 0.4734, 0.1924], [0.0402, 0.5502, 0.0694, 0.9032], [0.4844, 0.5361, 0.9369, 0.7874]], [[0.5157, 0.1863, 0.4574, 0.8033], [0.5986, 0.8090, 0.7605, 0.8252], [0.4264, 0.8952, 0.2279, 0.9746]]], [[[0.0803, 0.7114, 0.8773, 0.2341], [0.6497, 0.0423, 0.8407, 0.9515], [0.1821, 0.5931, 0.7160, 0.4968]], [[0.7977, 0.0899, 0.9572, 0.0146], [0.2804, 0.8569, 0.2292, 0.1118], [0.5747, 0.4064, 0.8370, 0.1611]]]], device='npu:0') >>> min_value = torch.min(input) >>> min_value tensor(0.0146, device='npu:0') >>> max_value = torch.max(input) >>> max_value tensor(0.9746, device='npu:0') >>> hist = torch.histc(input.to('cpu'), bins=128, min=min_value.to('cpu'), max=max_value.to('cpu')) >>> hist tensor([1., 0., 0., 2., 0., 0., 0., 1., 1., 0., 1., 0., 1., 0., 0., 0., 0., 0., 0., 1., 0., 0., 2., 1., 0., 0., 0., 0., 2., 1., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 1., 0., 0., 0., 1., 1., 0., 1., 1., 0., 1., 0., 1., 0., 0., 1., 0., 1., 0., 0., 1., 0., 0., 2., 0., 0., 0., 0., 0., 0., 2., 0., 0., 0., 0., 0., 0., 0., 1., 1., 0., 0., 0., 0., 0., 1., 0., 0., 0., 1., 1., 2., 0., 0., 1., 1., 1., 0., 1., 0., 0., 1., 0., 1., 1., 0., 0., 0., 1., 0., 1., 1., 0., 1.]) >>> cdf = torch.cumsum(hist,dim=0).int().npu() >>> cdf tensor([ 1, 1, 1, 3, 3, 3, 3, 4, 5, 5, 6, 6, 7, 7, 7, 7, 7, 7, 7, 8, 8, 8, 10, 11, 11, 11, 11, 11, 13, 14, 14, 14, 14, 14, 14, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 16, 16, 17, 17, 17, 17, 18, 19, 19, 20, 21, 21, 22, 22, 23, 23, 23, 24, 24, 25, 25, 25, 26, 26, 26, 28, 28, 28, 28, 28, 28, 28, 30, 30, 30, 30, 30, 30, 30, 30, 31, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 34, 35, 37, 37, 37, 38, 39, 40, 40, 41, 41, 41, 42, 42, 43, 44, 44, 44, 44, 45, 45, 46, 47, 47, 48], device='npu:0', dtype=torch.int32) >>> scale, offset = torch_npu.npu_ifmr(input, min_value, max_value, cdf, min_percentile=0.999999, max_percentile=0.999999, search_start=0.7, search_end=1.3, search_step=0.01, with_offset=False) >>> scale tensor(0.0080, device='npu:0') >>> offset tensor(0., device='npu:0')
torch_npu.npu_max(self, dim, keepdim=False) -> (Tensor, Tensor)
Count max result with dim.
>>> input = torch.randn(2, 2, 2, 2, dtype = torch.float32).npu() >>> input tensor([[[[-1.8135, 0.2078], [-0.6678, 0.7846]], [[ 0.6458, -0.0923], [-0.2124, -1.9112]]], [[[-0.5800, -0.4979], [ 0.2580, 1.1335]], [[ 0.6669, 0.1876], [ 0.1160, -0.1061]]]], device='npu:0') >>> outputs, indices = torch_npu.npu_max(input, 2) >>> outputs tensor([[[-0.6678, 0.7846], [ 0.6458, -0.0923]], [[ 0.2580, 1.1335], [ 0.6669, 0.1876]]], device='npu:0') >>> indices tensor([[[1, 1], [0, 0]], [[1, 1], [0, 0]]], device='npu:0', dtype=torch.int32)
torch_npu.npu_min(self, dim, keepdim=False) -> (Tensor values, Tensor indices)
>>> input = torch.randn(2, 2, 2, 2, dtype = torch.float32).npu() >>> input tensor([[[[-0.9909, -0.2369], [-0.9569, -0.6223]], [[ 0.1157, -0.3147], [-0.7761, 0.1344]]], [[[ 1.6292, 0.5953], [ 0.6940, -0.6367]], [[-1.2335, 0.2131], [ 1.0748, -0.7046]]]], device='npu:0') >>> outputs, indices = torch_npu.npu_min(input, 2) >>> outputs tensor([[[-0.9909, -0.6223], [-0.7761, -0.3147]], [[ 0.6940, -0.6367], [-1.2335, -0.7046]]], device='npu:0') >>> indices tensor([[[0, 1], [1, 0]], [[1, 1], [0, 1]]], device='npu:0', dtype=torch.int32)
torch_npu.npu_scatter(self, indices, updates, dim) -> Tensor
Count scatter result with dim.
>>> input = torch.tensor([[1.6279, 0.1226], [0.9041, 1.0980]]).npu() >>> input tensor([[1.6279, 0.1226], [0.9041, 1.0980]], device='npu:0') >>> indices = torch.tensor([0, 1],dtype=torch.int32).npu() >>> indices tensor([0, 1], device='npu:0', dtype=torch.int32) >>> updates = torch.tensor([-1.1993, -1.5247]).npu() >>> updates tensor([-1.1993, -1.5247], device='npu:0') >>> dim = 0 >>> output = torch_npu.npu_scatter(input, indices, updates, dim) >>> output tensor([[-1.1993, 0.1226], [ 0.9041, -1.5247]], device='npu:0')
torch_npu.npu_layer_norm_eval(input, normalized_shape, weight=None, bias=None, eps=1e-05) -> Tensor
>>> input = torch.rand((6, 4), dtype=torch.float32).npu() >>> input tensor([[0.1863, 0.3755, 0.1115, 0.7308], [0.6004, 0.6832, 0.8951, 0.2087], [0.8548, 0.0176, 0.8498, 0.3703], [0.5609, 0.0114, 0.5021, 0.1242], [0.3966, 0.3022, 0.2323, 0.3914], [0.1554, 0.0149, 0.1718, 0.4972]], device='npu:0') >>> normalized_shape = input.size()[1:] >>> normalized_shape torch.Size([4]) >>> weight = torch.Tensor(*normalized_shape).npu() >>> weight tensor([ nan, 6.1223e-41, -8.3159e-20, 9.1834e-41], device='npu:0') >>> bias = torch.Tensor(*normalized_shape).npu() >>> bias tensor([5.6033e-39, 6.1224e-41, 6.1757e-39, 6.1224e-41], device='npu:0') >>> output = torch_npu.npu_layer_norm_eval(input, normalized_shape, weight, bias, 1e-5) >>> output tensor([[ nan, 6.7474e-41, 8.3182e-20, 2.0687e-40], [ nan, 8.2494e-41, -9.9784e-20, -8.2186e-41], [ nan, -2.6695e-41, -7.7173e-20, 2.1353e-41], [ nan, -1.3497e-41, -7.1281e-20, -6.9827e-42], [ nan, 3.5663e-41, 1.2002e-19, 1.4314e-40], [ nan, -6.2792e-42, 1.7902e-20, 2.1050e-40]], device='npu:0')
torch_npu.npu_alloc_float_status(self) -> Tensor
Produce eight numbers with a value of zero.
>>> input = torch.randn([1,2,3]).npu() >>> output = torch_npu.npu_alloc_float_status(input) >>> input tensor([[[ 2.2324, 0.2478, -0.1056], [ 1.1273, -0.2573, 1.0558]]], device='npu:0') >>> output tensor([0., 0., 0., 0., 0., 0., 0., 0.], device='npu:0')
torch_npu.npu_get_float_status(self) -> Tensor
Compute npu_get_float_status operator function.
>>> x = torch.rand(2).npu() >>> torch_npu.npu_get_float_status(x) tensor([0., 0., 0., 0., 0., 0., 0., 0.], device='npu:0')
torch_npu.npu_clear_float_status(self) -> Tensor
Set the value of address 0x40000 to 0 in each core.
>>> x = torch.rand(2).npu() >>> torch_npu.npu_clear_float_status(x) tensor([0., 0., 0., 0., 0., 0., 0., 0.], device='npu:0')
torch_npu.npu_confusion_transpose(self, perm, shape, transpose_first) -> Tensor
Confuse reshape and transpose.
>>> x = torch.rand(2, 3, 4, 6).npu() >>> x.shape torch.Size([2, 3, 4, 6]) >>> y = torch_npu.npu_confusion_transpose(x, (0, 2, 1, 3), (2, 4, 18), True) >>> y.shape torch.Size([2, 4, 18]) >>> y2 = torch_npu.npu_confusion_transpose(x, (0, 2, 1), (2, 12, 6), False) >>> y2.shape torch.Size([2, 6, 12])
torch_npu.npu_bmmV2(self, mat2, output_sizes) -> Tensor
Multiply matrix "a" by matrix "b", producing "a * b" .
>>> mat1 = torch.randn(10, 3, 4).npu() >>> mat2 = torch.randn(10, 4, 5).npu() >>> res = torch_npu.npu_bmmV2(mat1, mat2, []) >>> res.shape torch.Size([10, 3, 5])
torch_npu.fast_gelu(self) -> Tensor
Compute the gradient for the fast_gelu of "x" .
>>> x = torch.rand(2).npu() >>> x tensor([0.5991, 0.4094], device='npu:0') >>> torch_npu.fast_gelu(x) tensor([0.4403, 0.2733], device='npu:0')
torch_npu.npu_deformable_conv2d(self, weight, offset, bias, kernel_size, stride, padding, dilation=[1,1,1,1], groups=1, deformable_groups=1, modulated=True) -> (Tensor, Tensor)
>>> x = torch.rand(16, 32, 32, 32).npu() >>> weight = torch.rand(32, 32, 5, 5).npu() >>> offset = torch.rand(16, 75, 32, 32).npu() >>> output, _ = torch_npu.npu_deformable_conv2d(x, weight, offset, None, kernel_size=[5, 5], stride = [1, 1, 1, 1], padding = [2, 2, 2, 2]) >>> output.shape torch.Size([16, 32, 32, 32])
torch_npu.npu_mish(self) -> Tensor
Compute hyperbolic tangent of "x" element-wise.
>>> x = torch.rand(10, 30, 10).npu() >>> y = torch_npu.npu_mish(x) >>> y.shape torch.Size([10, 30, 10])
torch_npu.npu_anchor_response_flags(self, featmap_size, stride, num_base_anchors) -> Tensor
Generate the responsible flags of anchor in a single feature map.
>>> x = torch.rand(100, 4).npu() >>> y = torch_npu.npu_anchor_response_flags(x, [60, 60], [2, 2], 9) >>> y.shape torch.Size([32400])
torch_npu.npu_yolo_boxes_encode(self, gt_bboxes, stride, performance_mode=False) -> Tensor
Generate bounding boxes based on yolo's "anchor" and "ground-truth" boxes. It is a customized mmdetection operator.
>>> anchor_boxes = torch.rand(2, 4).npu() >>> gt_bboxes = torch.rand(2, 4).npu() >>> stride = torch.tensor([2, 2], dtype=torch.int32).npu() >>> output = torch_npu.npu_yolo_boxes_encode(anchor_boxes, gt_bboxes, stride, False) >>> output.shape torch.Size([2, 4])
torch_npu.npu_grid_assign_positive(self, overlaps, box_responsible_flags, max_overlaps, argmax_overlaps, gt_max_overlaps, gt_argmax_overlaps, num_gts, pos_iou_thr, min_pos_iou, gt_max_assign_all) -> Tensor
Perform Position Sensitive PS ROI Pooling Grad.
>>> assigned_gt_inds = torch.rand(4).npu() >>> overlaps = torch.rand(2,4).npu() >>> box_responsible_flags = torch.tensor([1, 1, 1, 0], dtype=torch.uint8).npu() >>> max_overlap = torch.rand(4).npu() >>> argmax_overlap = torch.tensor([1, 0, 1, 0], dtype=torch.int32).npu() >>> gt_max_overlaps = torch.rand(2).npu() >>> gt_argmax_overlaps = torch.tensor([1, 0],dtype=torch.int32).npu() >>> output = torch_npu.npu_grid_assign_positive(assigned_gt_inds, overlaps, box_responsible_flags, max_overlap, argmax_overlap, gt_max_overlaps, gt_argmax_overlaps, 128, 0.5, 0., True) >>> output.shape torch.Size([4])
torch_npu.npu_normalize_batch(self, seq_len, normalize_type=0) -> Tensor
Perform batch normalization .
>>> a=np.random.uniform(1,10,(2,3,6)).astype(np.float32) >>> b=np.random.uniform(3,6,(2)).astype(np.int32) >>> x=torch.from_numpy(a).to("npu") >>> seqlen=torch.from_numpy(b).to("npu") >>> out = torch_npu.npu_normalize_batch(x, seqlen, 0) >>> out tensor([[[ 1.1496, -0.6685, -0.4812, 1.7611, -0.5187, 0.7571], [ 1.1445, -0.4393, -0.7051, 1.0474, -0.2646, -0.1582], [ 0.1477, 0.9179, -1.0656, -6.8692, -6.7437, 2.8621]], [[-0.6880, 0.1337, 1.3623, -0.8081, -1.2291, -0.9410], [ 0.3070, 0.5489, -1.4858, 0.6300, 0.6428, 0.0433], [-0.5387, 0.8204, -1.1401, 0.8584, -0.3686, 0.8444]]], device='npu:0')
torch_npu.npu_masked_fill_range(self, start, end, value, axis=-1) -> Tensor
Masked fill tensor along with one axis by range.boxes. It is a customized masked fill range operator .
>>> a=torch.rand(4,4).npu() >>> a tensor([[0.9419, 0.4919, 0.2874, 0.6560], [0.6691, 0.6668, 0.0330, 0.1006], [0.3888, 0.7011, 0.7141, 0.7878], [0.0366, 0.9738, 0.4689, 0.0979]], device='npu:0') >>> start = torch.tensor([[0,1,2]], dtype=torch.int32).npu() >>> end = torch.tensor([[1,2,3]], dtype=torch.int32).npu() >>> value = torch.tensor([1], dtype=torch.float).npu() >>> out = torch_npu.npu_masked_fill_range(a, start, end, value, 1) >>> out tensor([[1.0000, 0.4919, 0.2874, 0.6560], [0.6691, 1.0000, 0.0330, 0.1006], [0.3888, 0.7011, 1.0000, 0.7878], [0.0366, 0.9738, 0.4689, 0.0979]], device='npu:0')
torch_npu.npu_linear(input, weight, bias=None) -> Tensor
Multiply matrix "a" by matrix "b", producing "a * b" .
>>> x=torch.rand(2,16).npu() >>> w=torch.rand(4,16).npu() >>> b=torch.rand(4).npu() >>> output = torch_npu.npu_linear(x, w, b) >>> output tensor([[3.6335, 4.3713, 2.4440, 2.0081], [5.3273, 6.3089, 3.9601, 3.2410]], device='npu:0')
torch_npu.npu_bert_apply_adam(lr, beta1, beta2, epsilon, grad, max_grad_norm, global_grad_norm, weight_decay, step_size=None, adam_mode=0, *, out=(var,m,v))
Count adam result.
>>> var_in = torch.rand(321538).uniform_(-32., 21.).npu() >>> m_in = torch.zeros(321538).npu() >>> v_in = torch.zeros(321538).npu() >>> grad = torch.rand(321538).uniform_(-0.05, 0.03).npu() >>> max_grad_norm = -1. >>> beta1 = 0.9 >>> beta2 = 0.99 >>> weight_decay = 0. >>> lr = 0. >>> epsilon = 1e-06 >>> global_grad_norm = 0. >>> var_out, m_out, v_out = torch_npu.npu_bert_apply_adam(lr, beta1, beta2, epsilon, grad, max_grad_norm, global_grad_norm, weight_decay, out=(var_in, m_in, v_in)) >>> var_out tensor([ 14.7733, -30.1218, -1.3647, ..., -16.6840, 7.1518, 8.4872], device='npu:0')
torch_npu.npu_giou(self, gtboxes, trans=False, is_cross=False, mode=0) -> Tensor
First calculate the minimum closure area of the two boxes, IoU, then the proportion of the closed area that does not belong to the two boxes in the closure area, and finally subtract this proportion from IoU to get GIoU .
>>> a=np.random.uniform(0,1,(4,10)).astype(np.float16) >>> b=np.random.uniform(0,1,(4,10)).astype(np.float16) >>> box1=torch.from_numpy(a).to("npu") >>> box2=torch.from_numpy(a).to("npu") >>> output = torch_npu.npu_giou(box1, box2, trans=True, is_cross=False, mode=0) >>> output tensor([[1.], [1.], [1.], [1.], [1.], [1.], [1.], [1.], [1.], [1.]], device='npu:0', dtype=torch.float16)
torch_npu.npu_silu(self) -> Tensor
Compute the Swish of "x" .
>>> a=torch.rand(2,8).npu() >>> output = torch_npu.npu_silu(a) >>> output tensor([[0.4397, 0.7178, 0.5190, 0.2654, 0.2230, 0.2674, 0.6051, 0.3522], [0.4679, 0.1764, 0.6650, 0.3175, 0.0530, 0.4787, 0.5621, 0.4026]], device='npu:0')
torch_npu.npu_reshape(self, shape, bool can_refresh=False) -> Tensor
Reshape a tensor. Only the tensor shape is changed and its data is not changed.
This operator cannot be directly called by the acllopExecute API.
>>> a=torch.rand(2,8).npu() >>> out=torch_npu.npu_reshape(a,(4,4)) >>> out tensor([[0.6657, 0.9857, 0.7614, 0.4368], [0.3761, 0.4397, 0.8609, 0.5544], [0.7002, 0.3063, 0.9279, 0.5085], [0.1009, 0.7133, 0.8118, 0.6193]], device='npu:0')
torch_npu.npu_rotated_overlaps(self, query_boxes, trans=False) -> Tensor
Calculate the overlapping area of the rotated box.
>>> a=np.random.uniform(0,1,(1,3,5)).astype(np.float16) >>> b=np.random.uniform(0,1,(1,2,5)).astype(np.float16) >>> box1=torch.from_numpy(a).to("npu") >>> box2=torch.from_numpy(a).to("npu") >>> output = torch_npu.npu_rotated_overlaps(box1, box2, trans=False) >>> output tensor([[[0.0000, 0.1562, 0.0000], [0.1562, 0.3713, 0.0611], [0.0000, 0.0611, 0.0000]]], device='npu:0', dtype=torch.float16)
torch_npu.npu_rotated_iou(self, query_boxes, trans=False, mode=0, is_cross=True) -> Tensor
>>> a=np.random.uniform(0,1,(2,2,5)).astype(np.float16) >>> b=np.random.uniform(0,1,(2,3,5)).astype(np.float16) >>> box1=torch.from_numpy(a).to("npu") >>> box2=torch.from_numpy(a).to("npu") >>> output = torch_npu.npu_rotated_iou(box1, box2, trans=False, mode=0, is_cross=True) >>> output tensor([[[3.3325e-01, 1.0162e-01], [1.0162e-01, 1.0000e+00]], [[0.0000e+00, 0.0000e+00], [0.0000e+00, 5.9605e-08]]], device='npu:0', dtype=torch.float16)
torch_npu.npu_rotated_box_encode(anchor_box, gt_bboxes, weight) -> Tensor
>>> anchor_boxes = torch.tensor([[[30.69], [32.6], [45.94], [59.88], [-44.53]]], dtype=torch.float16).to("npu") >>> gt_bboxes = torch.tensor([[[30.44], [18.72], [33.22], [45.56], [8.5]]], dtype=torch.float16).to("npu") >>> weight = torch.tensor([1., 1., 1., 1., 1.], dtype=torch.float16).npu() >>> out = torch_npu.npu_rotated_box_encode(anchor_boxes, gt_bboxes, weight) >>> out tensor([[[-0.4253], [-0.5166], [-1.7021], [-0.0162], [ 1.1328]]], device='npu:0', dtype=torch.float16)
torch_npu.npu_rotated_box_decode(anchor_boxes, deltas, weight) -> Tensor
Rotate Bounding Box Encoding.
>>> anchor_boxes = torch.tensor([[[4.137],[33.72],[29.4], [54.06], [41.28]]], dtype=torch.float16).to("npu") >>> deltas = torch.tensor([[[0.0244], [-1.992], [0.2109], [0.315], [-37.25]]], dtype=torch.float16).to("npu") >>> weight = torch.tensor([1., 1., 1., 1., 1.], dtype=torch.float16).npu() >>> out = torch_npu.npu_rotated_box_decode(anchor_boxes, deltas, weight) >>> out tensor([[[ 1.7861], [-10.5781], [ 33.0000], [ 17.2969], [-88.4375]]], device='npu:0', dtype=torch.float16)
torch_npu.npu_ciou(Tensor self, Tensor gtboxes, bool trans=False, bool is_cross=True, int mode=0, bool atan_sub_flag=False) -> Tensor
Apply an NPU based CIOU operation.
A penalty item is added on the basis of DIoU, and CIoU is proposed.
Until now, ciou backward only supports trans==True, is_cross==False, mode==0('iou') in the current version. If you need back propagation, please ensure your parameter is correct!
>>> box1 = torch.randn(4, 32).npu() >>> box1.requires_grad = True >>> box2 = torch.randn(4, 32).npu() >>> box2.requires_grad = True >>> ciou = torch_npu.contrib.function.npu_ciou(box1, box2) >>> l = ciou.sum() >>> l.backward()
torch_npu.npu_diou(Tensor self, Tensor gtboxes, bool trans=False, bool is_cross=False, int mode=0) -> Tensor
Apply an NPU based DIOU operation.
Taking the distance between the targets,the overlap rate of the distance and the range into account. Different targets or boundaries will tend to be stable.
Until now, diou backward only supports trans==True, is_cross==False, mode==0('iou') in the current version. If you need back propagation, please ensure your parameter is correct!
>>> box1 = torch.randn(4, 32).npu() >>> box1.requires_grad = True >>> box2 = torch.randn(4, 32).npu() >>> box2.requires_grad = True >>> ciou = torch_npu.contrib.function.npu_diou(box1, box2) >>> l = diou.sum() >>> l.backward()
torch_npu.npu_sign_bits_pack(Tensor self, int size) -> Tensor
one-bit Adam pack of float into uint8.
Size needs to be divisible by output of packing floats. If size of x is divisible by 8, size of output is (size of x) / 8; otherwise, size of output is (size of x // 8) + 1, -1 float values will be added to fill divisibility, at little endian positions.
The AI Processors that support input type float32 and float16:
昇腾310 AI处理器 only supports input type float16.
>>>a = torch.tensor([5,4,3,2,0,-1,-2, 4,3,2,1,0,-1,-2],dtype=torch.float32).npu() >>>b = torch_npu.sign_bits_pack(a, 2) >>>b >>>tensor([[159],[15]], device='npu:0') >>>(binary form of 159 is ob10011111, corresponds to 4, -2, -1, 0, 2, 3, 4, 5 respectively)
torch_npu.sign_bits_unpack(x, dtype, size) -> Tensor
one-bit Adam unpack of uint8 into float.
Size needs to be divisible by output of unpacking uint8s. Size of output is (size of x) * 8;
>>>a = torch.tensor([159, 15], dtype=torch.uint8).npu() >>>b = torch_npu.sign_bits_unpack(a, 0, 2) >>>b >>>tensor([[1., 1., 1., 1., 1., -1., -1., 1.], >>>[1., 1., 1., 1., -1., -1., -1., -1.]], device='npu:0') (binary form of 159 is ob00001111)