NPU自定义算子
序号 |
算子名称 |
---|---|
1 |
torch_npu._npu_dropout |
2 |
torch_npu._npu_dropout_inplace |
3 |
torch_npu.copy_memory_ |
4 |
torch_npu.empty_with_format |
5 |
torch_npu.fast_gelu |
6 |
torch_npu.npu_alloc_float_status |
7 |
torch_npu.npu_anchor_response_flags |
8 |
torch_npu.npu_apply_adam |
9 |
torch_npu.npu_batch_nms |
10 |
torch_npu.npu_bert_apply_adam |
11 |
torch_npu.npu_bmmV2 |
12 |
torch_npu.npu_bounding_box_decode |
13 |
torch_npu.npu_bounding_box_encode |
14 |
torch_npu.npu_broadcast |
15 |
torch_npu.npu_ciou |
16 |
torch_npu.npu_clear_float_status |
17 |
torch_npu.npu_confusion_transpose |
18 |
torch_npu.npu_conv_transpose2d |
19 |
torch_npu.npu_conv2d |
20 |
torch_npu.npu_conv3d |
21 |
torch_npu.npu_convolution |
22 |
torch_npu.npu_convolution_transpose |
23 |
torch_npu.npu_deformable_conv2d |
24 |
torch_npu.npu_diou |
25 |
torch_npu.npu_dropoutV2 |
26 |
torch_npu.npu_dtype_cast |
27 |
torch_npu.npu_format_cast |
28 |
torch_npu.npu_format_cast_ |
29 |
torch_npu.npu_get_float_status |
30 |
torch_npu.npu_giou |
31 |
torch_npu.npu_grid_assign_positive |
32 |
torch_npu.npu_gru |
33 |
torch_npu.npu_ifmr |
34 |
torch_npu.npu_indexing |
35 |
torch_npu.npu_iou |
36 |
torch_npu.npu_layer_norm_eval |
37 |
torch_npu.npu_linear |
38 |
torch_npu.npu_lstm |
39 |
torch_npu.npu_masked_fill_range |
40 |
torch_npu.npu_max |
41 |
torch_npu.npu_mish |
42 |
torch_npu.npu_nms_v4 |
43 |
torch_npu.npu_nms_with_mask |
44 |
torch_npu.npu_normalize_batch |
45 |
torch_npu.npu_one_hot |
46 |
torch_npu.npu_pad |
47 |
torch_npu.npu_ps_roi_pooling |
48 |
torch_npu.npu_ptiou |
49 |
torch_npu.npu_random_choice_with_mask |
50 |
torch_npu.npu_roi_align |
51 |
torch_npu.npu_scatter |
52 |
torch_npu.npu_sign_bits_pack |
53 |
torch_npu.npu_sign_bits_unpack |
54 |
torch_npu.npu_slice |
55 |
torch_npu.npu_softmax_cross_entropy_with_logits |
56 |
torch_npu.npu_sort_v2 |
57 |
torch_npu.npu_stride_add |
58 |
torch_npu.npu_transpose |
59 |
torch_npu.npu_yolo_boxes_encode |
60 |
torch_npu.one_ |
映射关系
NPU自定义算子参数中存在部分映射关系可参考下表。
参数 |
映射参数 |
说明 |
---|---|---|
ACL_FORMAT_UNDEFINED |
-1 |
Format参数映射值。 |
ACL_FORMAT_NCHW |
0 |
|
ACL_FORMAT_NHWC |
1 |
|
ACL_FORMAT_ND |
2 |
|
ACL_FORMAT_NC1HWC0 |
3 |
|
ACL_FORMAT_FRACTAL_Z |
4 |
|
ACL_FORMAT_NC1HWC0_C04 |
12 |
|
ACL_FORMAT_HWCN |
16 |
|
ACL_FORMAT_NDHWC |
27 |
|
ACL_FORMAT_FRACTAL_NZ |
29 |
|
ACL_FORMAT_NCDHW |
30 |
|
ACL_FORMAT_NDC1HWC0 |
32 |
|
ACL_FRACTAL_Z_3D |
33 |
详细算子接口说明
torch_npu.npu_apply_adam(beta1_power, beta2_power, lr, beta1, beta2, epsilon, grad, use_locking, use_nesterov, out = (var, m, v))
Count adam result.
- Parameters:
- beta1_power (Scalar) - power of beta1
- beta2_power (Scalar) - power of beta2
- lr (Scalar) - learning rate
- beta1 (Scalar) - exponential decay rate for the 1st moment estimates
- beta2 (Scalar) - exponential decay rate for the 2nd moment estimates
- epsilon (Scalar) - term added to the denominator to improve numerical stability
- grad (Tensor) - the gradient
- use_locking (Bool) - If True, use locks for update operations
- use_nesterov (Bool) -If True, use the nesterov update
- var (Tensor) - variables to be optimized
- m (Tensor) - mean value of variables
- v (Tensor) - variance of variables
- Constraints:
- Examples:
torch_npu.npu_convolution_transpose(input, weight, bias, padding, output_padding, stride, dilation, groups) -> Tensor
Apply a 2D or 3D transposed convolution operator over an input image composed of several input planes, sometimes also called “deconvolution”.
- Parameters:
- input (Tensor) - input tensor of shape(minibatch, in_channels, iH, iW) or (minibatch, in_channels, iT, iH, iW)
- weight (Tensor) - filters of shape(in_channels, out_channels/groups, kH, kW) or (in_channels, out_channels/groups, kT, kH, kW)
- bias (Tensor, optional) - bias of shape(out_channels)
- padding (ListInt) - (dilation * (kernel_size - 1) - padding) Zero-padding will be added to both sides of each dimension in the input.
- output_padding (ListInt) - additional size added to one side of each dimension in the output shape
- stride (ListInt) - the stride of the convolving kernel
- dilation (ListInt) - the spacing between kernel elements
- groups (Int) - Split input into groups. In_channels should be divisible by the number of groups.
- Constraints:
- Examples:
torch_npu.npu_conv_transpose2d(input, weight, bias, padding, output_padding, stride, dilation, groups) -> Tensor
Apply a 2D transposed convolution operator over an input image composed of several input planes, sometimes also called “deconvolution”.
- Parameters:
- input (Tensor) - input tensor of shape(minibatch, in_channels, iH, iW)
- weight (Tensor) - filters of shape(in_channels, out_channels/groups, kH, kW)
- bias (Tensor, optional) - bias of shape(out_channels)
- padding (ListInt) - (dilation * (kernel_size - 1) - padding) Zero-padding will be added to both sides of each dimension in the input.
- output_padding (ListInt) - additional size added to one side of each dimension in the output shape
- stride (ListInt) - the stride of the convolving kernel
- dilation (ListInt) - the spacing between kernel elements
- groups (Int) - Split input into groups. In_channels should be divisible by the number of groups.
- Constraints:
- Examples:
torch_npu.npu_convolution(input, weight, bias, stride, padding, dilation, groups) -> Tensor
Apply a 2D or 3D convolution over an input image composed of several input planes.
- Parameters:
- input (Tensor) - input tensor of shape(minibatch, in_channels, iH, iW) or (minibatch, in_channels, iT, iH, iW)
- weight (Tensor) - filters of shape(out_channels, in_channels/groups, kH, kW) or (out_channels, in_channels/groups, kT, kH, kW)
- bias (Tensor, optional) - bias of shape(out_channels)
- stride (ListInt) - the stride of the convolving kernel
- padding (ListInt) - implicit paddings on both sides of the input
- dilation (ListInt) - the spacing between kernel elements
- groups (Int) - Split input into groups. In_channels should be divisible by the number of groups.
- Constraints:
- Examples:
torch_npu.npu_conv2d(input, weight, bias, stride, padding, dilation, groups) -> Tensor
Apply a 2D convolution over an input image composed of several input planes.
- Parameters:
- input (Tensor) - input tensor of shape(minibatch, in_channels, iH, iW)
- weight (Tensor) - filters of shape(out_channels, in_channels/groups, kH, kW)
- bias (Tensor, optional) - bias of shape(out_channels)
- stride (ListInt) - the stride of the convolving kernel
- padding (ListInt) - implicit paddings on both sides of the input
- dilation (ListInt) - the spacing between kernel elements
- groups (Int) - Split input into groups. In_channels should be divisible by the number of groups.
- Constraints:
- Examples:
torch_npu.npu_conv3d(input, weight, bias, stride, padding, dilation, groups) -> Tensor
Apply a 3D convolution over an input image composed of several input planes.
- Parameters:
- input (Tensor) - input tensor of shape(minibatch, in_channels, iT, iH, iW)
- weight (Tensor) - filters of shape(out_channels, in_channels/groups, kT, kH, kW)
- bias (Tensor, optional) - bias of shape(out_channels)
- stride (ListInt) - the stride of the convolving kernel
- padding (ListInt) - implicit paddings on both sides of the input
- dilation (ListInt) - the spacing between kernel elements
- groups (Int) - Split input into groups. In_channels should be divisible by the number of groups.
- Constraints:
- Examples:
torch_npu.one_(self) -> Tensor
Fills self tensor with 1. .
- Parameters:
- self (Tensor) - input tensor
- Constraints:
- Examples:
>>> x = torch.rand(2, 3).npu() >>> xtensor([[0.6072, 0.9726, 0.3475], [0.3717, 0.6135, 0.6788]], device='npu:0') >>> x.one_()tensor([[1., 1., 1.], [1., 1., 1.]], device='npu:0')
torch_npu.npu_sort_v2(self, dim=-1, descending=False, out=None) -> Tensor
Sort the elements of the input tensor along a given dimension in ascending order by values without indices. If dim is not given, the last dimension of the input is chosen. If descending is True then the elements are sorted in descending order by value.
- Parameters:
- self (Tensor) - the input tensor
- dim (Int, optional) - the dimension to sort along
- descending (Bool, optional) - control the sorting order (ascending or descending)
- Constraints:
- Examples:
>>> x = torch.randn(3, 4).npu() >>> x tensor([[-0.0067, 1.7790, 0.5031, -1.7217], [ 1.1685, -1.0486, -0.2938, 1.3241], [ 0.1880, -2.7447, 1.3976, 0.7380]], device='npu:0') >>> sorted_x = torch_npu.npu_sort_v2(x) >>> sorted_x tensor([[-1.7217, -0.0067, 0.5029, 1.7793], [-1.0488, -0.2937, 1.1689, 1.3242], [-2.7441, 0.1880, 0.7378, 1.3975]], device='npu:0')
torch_npu.npu_format_cast(self, acl_format) -> Tensor
Change the format of an npu tensor.
- Parameters:
- self (Tensor) - the input tensor
- acl_format (Int) - the target format to transform
- Constraints:
- Examples:
>>> x = torch.rand(2, 3, 4, 5).npu() >>> torch_npu.get_npu_format(x) 0 >>> x1 = x.npu_format_cast(29) >>> torch_npu.get_npu_format(x1) 29
torch_npu.npu_format_cast_(self, src) -> Tensor
In-place change the format of self, with the same format as src.
- Parameters:
- self (Tensor) - the input tensor
- src (Tensor) - the target format to transform
- Constraints:
- Examples:
>>> x = torch.rand(2, 3, 4, 5).npu() >>> torch_npu.get_npu_format(x) 0 >>> torch_npu.get_npu_format(x.npu_format_cast_(29)) 29
torch_npu.npu_transpose(self, perm, require_contiguous) -> Tensor
Return a view of the original tensor with its dimensions permuted, and make the result contiguous.
- Parameters:
- self (Tensor) - the input tensor
- perm (ListInt) - the desired ordering of dimensions
- require_contiguous (Bool)
- Constraints:
- Examples:
>>> x = torch.randn(2, 3, 5).npu() >>> x.shape torch.Size([2, 3, 5]) >>> x1 = torch_npu.npu_transpose(x, (2, 0, 1)) >>> x1.shape torch.Size([5, 2, 3]) >>> x2 = x.npu_transpose(2, 0, 1) >>> x2.shape torch.Size([5, 2, 3])
torch_npu.npu_broadcast(self, size) -> Tensor
Return a new view of the self tensor with singleton dimensions expanded to a larger size, and make the result contiguous.
torch_npu.npu_dtype_cast(input, dtype) -> Tensor
torch_npu.empty_with_format(size, dtype, layout, device, pin_memory, acl_format) -> Tensor
- Parameters:
- size (ListInt) – A sequence of integers defining the shape of the output tensor. Can be a variable number of arguments or a collection like a list or tuple.
- dtype (torch.dtype, optional) – The desired data type of returned tensor. Default: None. If None, use a global default (see torch.set_default_tensor_type()).
- layout (torch.layout, optional) – The desired layout of returned tensor. Default: torch.strided.
- device (torch.device, optional) – The desired device of returned tensor. Default: None.
- pin_memory (Bool, optional) – If set, the returned tensor will be allocated in the pinned memory. Default: False.
- acl_format (Int) – The desired memory format of returned tensor. Default: 2.
torch_npu.copy_memory_(dst, src, non_blocking=False) -> Tensor
- Parameters:
- dst (Tensor) - the source tensor to copy from
- src (Tensor) - the desired data type of the returned tensor
- non_blocking (Bool) - If True and this copy is between CPU and NPU, the copy may occur asynchronously with respect to the host. In other cases, this argument has no effect.
- Constraints:
copy_memory_ only supports npu tensor. Input tensors of copy_memory_ should have the same dtype and device index.
- Examples:
>>> a=torch.IntTensor([0, 0, -1]).npu() >>> b=torch.IntTensor([1, 1, 1]).npu() >>> a.copy_memory_(b) tensor([1, 1, 1], device='npu:0', dtype=torch.int32)
torch_npu.npu_one_hot(input, num_classes=-1, depth=1, on_value=1, off_value=0) -> Tensor
- Parameters:
- input (Tensor) - Class values of any shape.
- num_classes (Int) - The axis to fill. Default: "-1".
- depth (Int) - The depth of the one_hot dimension.
- on_value (Scalar) - The value to fill in output when indices[j] == i.
- off_value (Scalar) - The value to fill in output when indices[j] != i.
- Constraints:
- Examples:
>>> a=torch.IntTensor([5, 3, 2, 1]).npu() >>> b=torch_npu.npu_one_hot(a, depth=5) >>> btensor([[0., 0., 0., 0., 0.], [0., 0., 0., 1., 0.], [0., 0., 1., 0., 0.], [0., 1., 0., 0., 0.]], device='npu:0')
- Examples:
torch_npu.npu_stride_add(x1, x2, offset1, offset2, c1_len) -> Tensor
- Parameters:
- x1 (Tensor) - A tensor in 5HD.
- x2 (Tensor) - A tensor of the same type as "x1", and the same shape as "x1", except for the C1 value.
- offset1 (Scalar) - A required int. Offset value of C1 in "x1".
- offset2 (Scalar) - A required int. Offset value of C1 in "x2".
- c1_len (Scalar) - A required int. C1 len of "y". The value must be less than the difference between C1 and offset in "x1" and "x2".
- Constraints:
- Examples:
>>> a=torch.tensor([[[[[1.]]]]]).npu() >>> b=torch_npu.npu_stride_add(a, a, 0, 0, 1) >>> btensor([[[[[2.]]], [[[0.]]], [[[0.]]], [[[0.]]], [[[0.]]], [[[0.]]], [[[0.]]], [[[0.]]], [[[0.]]], [[[0.]]], [[[0.]]], [[[0.]]], [[[0.]]], [[[0.]]], [[[0.]]], [[[0.]]]]], device='npu:0')
torch_npu.npu_softmax_cross_entropy_with_logits(features, labels) -> Tensor
Compute softmax cross entropy cost.
- Parameters:
- features (Tensor) - A tensor. A "batch_size * num_classes" matrix.
- labels (Tensor) - A tensor of the same type as "features". A "batch_size * num_classes" matrix.
- Constraints:
- Examples:
torch_npu.npu_ps_roi_pooling(x, rois, spatial_scale, group_size, output_dim) -> Tensor
- Parameters:
- x (Tensor) - An NC1HWC0 tensor, describing the feature map, dimension C1 must be equal to (int(output_dim+15)/C0)) group_size.
- rois (Tensor) - A tensor with shape [batch, 5, rois_num], describing the ROIs. Each ROI consists of five elements: "batch_id", "x1", "y1", "x2", and "y2", which "batch_id" indicates the index of the input feature map and "x1", "y1", "x2", or "y2" must be greater than or equal to "0.0".
- spatial_scale (Float) - A required float32, scaling factor for mapping the input coordinates to the ROI coordinates .
- group_size (Int) - A required int32, specifying the number of groups to encode position-sensitive score maps. Must be within the range (0, 128).
- output_dim (Int) - A required int32, specifying the number of output channels. Must be greater than 0.
- Constraints:
- Examples:
>>> roi = torch.tensor([[[1], [2], [3], [4], [5]], [[6], [7], [8], [9], [10]]], dtype = torch.float16).npu() >>> x = torch.tensor([[[[ 1]], [[ 2]], [[ 3]], [[ 4]], [[ 5]], [[ 6]], [[ 7]], [[ 8]]], [[[ 9]], [[10]], [[11]], [[12]], [[13]], [[14]], [[15]], [[16]]]], dtype = torch.float16).npu() >>> out = torch_npu.npu_ps_roi_pooling(x, roi, 0.5, 2, 2) >>> outtensor([[[[0., 0.], [0., 0.]], [[0., 0.], [0., 0.]]], [[[0., 0.], [0., 0.]], [[0., 0.], [0., 0.]]]], device='npu:0', dtype=torch.float16)
torch_npu.npu_roi_align(features, rois, spatial_scale, pooled_height, pooled_width, sample_num, roi_end_mode) -> Tensor
Obtain the ROI feature matrix from the feature map. It is a customized FasterRcnn operator.
- Parameters:
- features (Tensor) - A tensor in 5HD.
- rois (Tensor) - ROI position. A 2D tensor with shape (N, 5). "N" indicates the number of ROIs, the value "5" indicates the indexes of images where the ROIs are located, "x0", "y0", "x1", and "y1".
- spatial_scale (Float) - A required attribute of type float32, specifying the scaling ratio of "features" to the original image.
- pooled_height (Int) - A required attribute of type int32, specifying the H dimension.
- pooled_width (Int) - A required attribute of type int32, specifying the W dimension.
- sample_num (Int) - An optional attribute of type int32, specifying the horizontal and vertical sampling frequency of each output. If this attribute is set to "0", the sampling frequency is equal to the rounded up value of "rois", which is a floating point number. Default: "2".
- roi_end_mode (Int) - An optional attribute of type int32. Default: "1".
- Constraints:
- Examples:
>>> x = torch.FloatTensor([[[[1, 2, 3 , 4, 5, 6], [7, 8, 9, 10, 11, 12], [13, 14, 15, 16, 17, 18], [19, 20, 21, 22, 23, 24], [25, 26, 27, 28, 29, 30], [31, 32, 33, 34, 35, 36]]]]).npu() >>> rois = torch.tensor([[0, -2.0, -2.0, 22.0, 22.0]]).npu() >>> out = torch_npu.npu_roi_align(x, rois, 0.25, 3, 3, 2, 0) >>> out tensor([[[[ 4.5000, 6.5000, 8.5000], [16.5000, 18.5000, 20.5000], [28.5000, 30.5000, 32.5000]]]], device='npu:0')
torch_npu.npu_nms_v4(boxes, scores, max_output_size, iou_threshold, scores_threshold, pad_to_max_output_size=False) -> (Tensor, Tensor)
- Parameters:
- boxes (Tensor) - A 2D float tensor of shape [num_boxes, 4].
- scores (Tensor) - An 1D float tensor of shape [num_boxes] representing a single score corresponding to each box (each row of boxes).
- max_output_size (Scalar) - A scalar representing the maximum number of boxes to be selected by non max suppression.
- iou_threshold (Tensor) - A 0D float tensor representing the threshold for deciding whether boxes overlap too much with respect to IOU.
- scores_threshold (Tensor) - A 0D float tensor representing the threshold for deciding when to remove boxes based on score.
- pad_to_max_output_size (Bool) - If True, the output selected_indices is padded to be of length max_output_size. Default: False.
- Returns:
- selected_indices - An 1D integer tensor of shape [M] representing the selected indices from the boxes tensor, where M <= max_output_size.
- valid_outputs - A 0D integer tensor representing the number of valid elements in selected_indices, with the valid elements appearing first.
- Constraints:
- Examples:
>>> boxes=torch.randn(100,4).npu() >>> scores=torch.randn(100).npu() >>> boxes.uniform_(0,100) >>> scores.uniform_(0,1) >>> max_output_size = 20 >>> iou_threshold = torch.tensor(0.5).npu() >>> scores_threshold = torch.tensor(0.3).npu() >>> npu_output = torch_npu.npu_nms_v4(boxes, scores, max_output_size, iou_threshold, scores_threshold) >>> npu_output (tensor([57, 65, 25, 45, 43, 12, 52, 91, 23, 78, 53, 11, 24, 62, 22, 67, 9, 94, 54, 92], device='npu:0', dtype=torch.int32), tensor(20, device='npu:0', dtype=torch.int32))
torch_npu.npu_nms_rotated(dets, scores, iou_threshold, scores_threshold=0, max_output_size=-1, mode=0) -> (Tensor, Tensor)
- Parameters:
- dets (Tensor) - A 2D float tensor of shape [num_boxes, 5].
- scores (Tensor) - An 1D float tensor of shape [num_boxes] representing a single score corresponding to each box (each row of boxes).
- iou_threshold (Float) - A scalar representing the threshold for deciding whether boxes overlap too much with respect to IOU.
- scores_threshold (Float) - A scalar representing the threshold for deciding when to remove boxes based on score. Default: 0.
- max_output_size (Int) - A scalar integer tensor representing the maximum number of boxes to be selected by non max suppression. Default: -1, that is, no constraint is imposed.
- mode (Int) - This parameter specifies the layout type of the dets. If mode is set to 0, the input values of dets are x, y, w, h, and angle. If mode is set to 1, the input values of dets are x1, y1, x2, y2, and angle. Default: 0.
- Returns:
- selected_index - An 1D integer tensor of shape [M] representing the selected indices from the dets tensor, where M <= max_output_size.
- selected_num - A 0D integer tensor representing the number of valid elements in selected_indices.
- Constraints:
- Examples:
>>> dets=torch.randn(100,5).npu() >>> scores=torch.randn(100).npu() >>> dets.uniform_(0,100) >>> scores.uniform_(0,1) >>> output1, output2 = torch_npu.npu_nms_rotated(dets, scores, 0.2, 0, -1, 1) >>> output1 tensor([76, 48, 15, 65, 91, 82, 21, 96, 62, 90, 13, 59, 0, 18, 47, 23, 8, 56, 55, 63, 72, 39, 97, 81, 16, 38, 17, 25, 74, 33, 79, 44, 36, 88, 83, 37, 64, 45, 54, 41, 22, 28, 98, 40, 30, 20, 1, 86, 69, 57, 43, 9, 42, 27, 71, 46, 19, 26, 78, 66, 3, 52], device='npu:0', dtype=torch.int32) >>> output2tensor([62], device='npu:0', dtype=torch.int32)
torch_npu.npu_lstm(x, weight, bias, seqMask, h, c, has_biases, num_layers, dropout, train, bidirectional, batch_first, flag_seq, direction)
DynamicRNN calculation.
- Parameters:
- x (Tensor) - A required 4D tensor. Must be one of the following types: float16, float32. The format must be FRACTAL_NZ.
- weight (Tensor) - A required 4D tensor. Must be one of the following types: float16, float32. The format must be FRACTAL_ZN_LSTM.
- bias (Tensor) - A required 1D tensor. Must be one of the following types: float16, float32. The format must be ND.
- seqMask (Tensor) - An tensor. Only support float16 in FRACTAL_NZ and int32 in ND.
- h (Tensor) - An 4D tensor. Must be one of the following types: float16, float32. The format must be FRACTAL_NZ.
- c (Tensor) - An 4D tensor. Must be one of the following types: float16, float32. The format must be FRACTAL_NZ.
- has_biases (Bool) - If the value is True, bias exists.
- num_layers (Int) - Number of recurrent layers. Only Support single layer currently.
- dropout (Float) - If non-zero, introduces a dropout layer on the outputs of each LSTM layer except the last layer, with dropout probability equal to dropout. Unsupport currently.
- train (Bool) - A bool identifying whether it is training in the op. Default: True .
- bidirectional (Bool) - If True, become a bidirectional LSTM. Unsupport currently.
- batch_first (Bool) - If True, then the input and output tensors are provided as (batch, seq, feature). Unsupport currently.
- flag_seq (Bool) - If True, then the input is PackSequnce. Unsupport currently.
- direction (Bool) - If True, then the direction is "REDIRECTIONAL", otherwise is "UNIDIRECTIONAL".
- Returns:
- y - A 4D Tensor. Must be one of the following types: float16, float32. The format must be FRACTAL_NZ.
- output_h - A 4D Tensor. Must be one of the following types: float16, float32. The format must be FRACTAL_NZ.
- output_c - A 4D Tensor. Must be one of the following types: float16, float32. The format must be FRACTAL_NZ.
- i - A 4D Tensor. Must be one of the following types: float16, float32. The format must be FRACTAL_NZ.
- j - A 4D Tensor. Must be one of the following types: float16, float32. The format must be FRACTAL_NZ.
- f - A 4D Tensor. Must be one of the following types: float16, float32. The format must be FRACTAL_NZ.
- o - A 4D Tensor. Must be one of the following types: float16, float32. The format must be FRACTAL_NZ.
- tanhct - A 4D Tensor. Must be one of the following types: float16, float32. The format must be FRACTAL_NZ.
- Constraints:
- Examples:
torch_npu.npu_iou(bboxes, gtboxes, mode=0) -> Tensor torch_npu.npu_ptiou(bboxes, gtboxes, mode=0) -> Tensor
- Parameters:
- bboxes (Tensor) - the input tensor
- gtboxes (Tensor) - the input tensor
- mode (Int) - 0 1 corresponds to two modes iou iof.
- Constraints:
- Examples:
>>> bboxes = torch.tensor([[0, 0, 10, 10], [10, 10, 20, 20], [32, 32, 38, 42]], dtype=torch.float16).to("npu") >>> gtboxes = torch.tensor([[0, 0, 10, 20], [0, 10, 10, 10], [10, 10, 20, 20]], dtype=torch.float16).to("npu") >>> output_iou = torch_npu.npu_iou(bboxes, gtboxes, 0) >>> output_iou tensor([[0.4985, 0.0000, 0.0000], [0.0000, 0.0000, 0.0000], [0.0000, 0.9961, 0.0000]], device='npu:0', dtype=torch.float16)
torch_npu.npu_pad(input, paddings) -> Tensor
- Parameters:
- input (Tensor) - the input tensor
- paddings (ListInt) - type int32 or int64
- Constraints:
- Examples:
>>> input = torch.tensor([[20, 20, 10, 10]], dtype=torch.float16).to("npu") >>> paddings = [1, 1, 1, 1] >>> output = torch_npu.npu_pad(input, paddings) >>> output tensor([[ 0., 0., 0., 0., 0., 0.], [ 0., 20., 20., 10., 10., 0.], [ 0., 0., 0., 0., 0., 0.]], device='npu:0', dtype=torch.float16)
torch_npu.npu_nms_with_mask(input, iou_threshold) -> (Tensor, Tensor, Tensor)
- Parameters:
- input (Tensor) - the input tensor
- iou_threshold (Scalar) - Threshold. If the value exceeds this threshold, the value is 1. Otherwise, the value is 0.
- Returns:
- selected_boxes - 2D tensor with shape of [N,5], representing filtered boxes including proposal boxes and corresponding confidence scores.
- selected_idx - 1D tensor with shape of [N], representing the index of input proposal boxes.
- selected_mask - 1D tensor with shape of [N], the symbol judging whether the output proposal boxes is valid .
- Constraints:
- Examples:
>>> input = torch.tensor([[0.0, 1.0, 2.0, 3.0, 0.6], [6.0, 7.0, 8.0, 9.0, 0.4]], dtype=torch.float16).to("npu") >>> iou_threshold = 0.5 >>> output1, output2, output3, = torch_npu.npu_nms_with_mask(input, iou_threshold) >>> output1 tensor([[0.0000, 1.0000, 2.0000, 3.0000, 0.6001], [6.0000, 7.0000, 8.0000, 9.0000, 0.3999]], device='npu:0', dtype=torch.float16) >>> output2 tensor([0, 1], device='npu:0', dtype=torch.int32) >>> output3 tensor([1, 1], device='npu:0', dtype=torch.uint8)
torch_npu.npu_bounding_box_encode(anchor_box, ground_truth_box, means0, means1, means2, means3, stds0, stds1, stds2, stds3) -> Tensor
- Parameters:
- anchor_box (Tensor) - The input tensor.Anchor boxes. A 2D Tensor of float32 with shape (N, 4). "N" indicates the number of bounding boxes, and the value "4" refers to "x0", "x1", "y0", and "y1".
- ground_truth_box (Tensor) - The input tensor.Ground truth boxes. A 2D Tensor of float32 with shape (N, 4). "N" indicates the number of bounding boxes, and the value "4" refers to "x0", "x1", "y0", and "y1"
- means0 (Float) - An index of type float
- means1 (Float) - An index of type float
- means2 (Float) - An index of type float
- means3 (Float) - An index of type int. Default: [0,0,0,0]. "deltas" = "deltas" x "stds" + "means".
- stds0 (Float) - An index of type float
- stds1 (Float) - An index of type float
- stds2 (Float) - An index of type float
- stds3 (Float) - An index of type float. Default: [1.0,1.0,1.0,1.0]. "deltas" = "deltas" x "stds" + "means" .
- Constraints:
- Examples:
>>> anchor_box = torch.tensor([[1., 2., 3., 4.], [3.,4., 5., 6.]], dtype = torch.float32).to("npu") >>> ground_truth_box = torch.tensor([[5., 6., 7., 8.], [7.,8., 9., 6.]], dtype = torch.float32).to("npu") >>> output = torch_npu.npu_bounding_box_encode(anchor_box, ground_truth_box, 0, 0, 0, 0, 0.1, 0.1, 0.2, 0.2) >>> outputtensor([[13.3281, 13.3281, 0.0000, 0.0000], [13.3281, 6.6641, 0.0000, -5.4922]], device='npu:0')
torch_npu.npu_bounding_box_decode(rois, deltas, means0, means1, means2, means3, stds0, stds1, stds2, stds3, max_shape, wh_ratio_clip) -> Tensor
- Parameters:
- rois (Tensor) - Region of interests (ROIs) generated by the region proposal network (RPN). A 2D Tensor of type float32 or float16 with shape (N, 4). "N" indicates the number of ROIs, and the value "4" refers to "x0", "x1", "y0", and "y1".
- deltas (Tensor) - Absolute variation between the ROIs generated by the RPN and ground truth boxes. A 2D Tensor of type float32 or float16 with shape (N, 4). "N" indicates the number of errors, and 4 indicates "dx", "dy", "dw", and "dh" .
- means0 (Float) - An index of type float
- means1 (Float) - An index of type float
- means2 (Float) - An index of type float
- means3 (Float) - An index of type float. Default: [0,0,0,0]. "deltas" = "deltas" x "stds" + "means".
- stds0 (Float) - An index of type float
- stds1 (Float) - An index of type float
- stds2 (Float) - An index of type float
- stds3 (Float) - An index of type float. Default: [1.0,1.0,1.0,1.0]. "deltas" = "deltas" x "stds" + "means" .
- max_shape (ListInt of length 2) - Shape [h, w], specifying the size of the image transferred to the network. It is used to ensure that the bbox shape after conversion does not exceed "max_shape".
- wh_ratio_clip (Float) - The values of "dw" and "dh" fall within (-wh_ratio_clip, wh_ratio_clip) .
- Constraints:
- Examples:
>>> rois = torch.tensor([[1., 2., 3., 4.], [3.,4., 5., 6.]], dtype = torch.float32).to("npu") >>> deltas = torch.tensor([[5., 6., 7., 8.], [7.,8., 9., 6.]], dtype = torch.float32).to("npu") >>> output = torch_npu.npu_bounding_box_decode(rois, deltas, 0, 0, 0, 0, 1, 1, 1, 1, (10, 10), 0.1) >>> output tensor([[2.5000, 6.5000, 9.0000, 9.0000], [9.0000, 9.0000, 9.0000, 9.0000]], device='npu:0')
torch_npu.npu_gru(input, hx, weight_input, weight_hidden, bias_input, bias_hidden, seq_length, has_biases, num_layers, dropout, train, bidirectional, batch_first) -> (Tensor, Tensor, Tensor, Tensor, Tensor, Tensor)
DynamicGRUV2 calculation.
- Parameters:
- input (Tensor) - Must be one of the following types: float16. The format must be FRACTAL_NZ.
- hx (Tensor) - Must be one of the following types: float16, float32. The format must be FRACTAL_NZ.
- weight_input (Tensor) - Must be one of the following types: float16. The format must be FRACTAL_Z.
- weight_hidden (Tensor) - Must be one of the following types: float16. The format must be FRACTAL_Z.
- bias_input (Tensor) - Must be one of the following types: float16, float32. The format must be ND.
- bias_hidden (Tensor) - Must be one of the following types: float16, float32. The format must be ND.
- seq_length (Tensor) - Must be one of the following types: int32. The format must be ND.
- has_biases (Bool) - Default: True.
- num_layers (Int)
- dropout (Float)
- train (Bool) - A bool identifying whether it is training in the op. Default: True.
- bidirectional (Bool) - Default: True.
- batch_first (Bool) - Default: True.
- Returns:
- y (Tensor) - Must be one of the following types: float16, float32. The format must be FRACTAL_NZ.
- output_h (Tensor) - Must be one of the following types: float16, float32. The format must be FRACTAL_NZ.
- update (Tensor) - Must be one of the following types: float16, float32. The format must be FRACTAL_NZ.
- reset (Tensor) - Must be one of the following types: float16, float32. The format must be FRACTAL_NZ.
- new (Tensor) - Must be one of the following types: float16, float32. The format must be FRACTAL_NZ.
- hidden_new (Tensor) - Must be one of the following types: float16, float32. The format must be FRACTAL_NZ.
- Constraints:
- Examples:
torch_npu.npu_random_choice_with_mask(x, count=256, seed=0, seed2=0) -> (Tensor, Tensor)
- Parameters:
- x (Tensor) - the input tensor.
- count (Int) - The count of output. If 0, out all non-zero elements.
- seed (Int) - type int32 or int64
- seed2 (Int) - type int32 or int64
- Returns:
- y - 2D tensor, non-zero element index.
- mask - 1D tensor, whether the corresponding index is valid.
- Constraints:
- Examples:
>>> x = torch.tensor([1, 0, 1, 0], dtype=torch.bool).to("npu") >>> result, mask = torch_npu.npu_random_choice_with_mask(x, 2, 1, 0) >>> resulttensor([[0], [2]], device='npu:0', dtype=torch.int32) >>> mask tensor([True, True], device='npu:0')
torch_npu.npu_batch_nms(self, scores, score_threshold, iou_threshold, max_size_per_class, max_total_size, change_coordinate_frame=False, transpose_box=False) -> (Tensor, Tensor, Tensor, Tensor)
- Parameters:
- self (Tensor) - the input tensor
- scores (Tensor) - the input tensor
- score_threshold (Float) - A required attribute of type float32, specifying the score filter iou iou_threshold.
- iou_threshold (Float) - A required attribute of type float32, specifying the nms iou iou_threshold.
- max_size_per_class (Int) - A required attribute of type int, specifying the nms output num per class.
- max_total_size (Int) - A required attribute of type int, specifying the the nms output num per batch.
- change_coordinate_frame (Bool) - A required attribute of type bool, whether to normalize coordinates after clipping.
- transpose_box (Bool) - A required attribute of type bool, whether insert transpose before this op. Must be "False".
- Returns:
- nmsed_boxes (Tensor) - A 3D tensor of type float16 with shape (batch, max_total_size, 4),specifying the output nms boxes per batch.
- nmsed_scores (Tensor) - A 2D tensor of type float16 with shape (batch, max_total_size),specifying the output nms score per batch.
- nmsed_classes (Tensor) - A 2D tensor of type float16 with shape (batch, max_total_size),specifying the output nms class per batch.
- nmsed_num (Tensor) - An 1D tensor of type int32 with shape (batch), specifying the valid num of nmsed_boxes.
- Constraints:
- Examples:
>>> boxes = torch.randn(8, 2, 4, 4, dtype = torch.float32).to("npu") >>> scores = torch.randn(3, 2, 4, dtype = torch.float32).to("npu") >>> nmsed_boxes, nmsed_scores, nmsed_classes, nmsed_num = torch_npu.npu_batch_nms(boxes, scores, 0.3, 0.5, 3, 4) >>> nmsed_boxes >>> nmsed_scores >>> nmsed_classes >>> nmsed_num
torch_npu.npu_slice(self, offsets, size) -> Tensor
Extract a slice from a tensor.
- Parameters:
- self (Tensor) - the input tensor
- offsets (ListInt) - type int32 or int64
- size (ListInt) - type int32 or int64
- Constraints:
- Examples:
>>> input = torch.tensor([[1,2,3,4,5], [6,7,8,9,10]], dtype=torch.float16).to("npu") >>> offsets = [0, 0]>>> size = [2, 2] >>> output = torch_npu.npu_slice(input, offsets, size) >>> output tensor([[1., 2.], [6., 7.]], device='npu:0', dtype=torch.float16)
torch_npu.npu_dropoutV2(self, seed, p) -> (Tensor, Tensor, Tensor(a!))
Count dropout result with seed.
- Parameters:
- self (Tensor) - the input tensor
- seed (Tensor) - the input tensor
- p (Float) - dropout probability
- Returns:
- y - A tensor with the same shape and type as "x".
- mask - A tensor with the same shape and type as "x".
- new_seed - A tensor with the same shape and type as "seed".
- Constraints:
- Examples:
>>> input = torch.tensor([1.,2.,3.,4.]).npu() >>> input tensor([1., 2., 3., 4.], device='npu:0') >>> seed = torch.rand((32,),dtype=torch.float32).npu() >>> seed tensor([0.4368, 0.7351, 0.8459, 0.4657, 0.6783, 0.8914, 0.8995, 0.4401, 0.4408, 0.4453, 0.2404, 0.9680, 0.0999, 0.8665, 0.2993, 0.5787, 0.0251, 0.6783, 0.7411, 0.0670, 0.9430, 0.9165, 0.3983, 0.5849, 0.7722, 0.4659, 0.0486, 0.2693, 0.6451, 0.2734, 0.3176, 0.0176], device='npu:0') >>> prob = 0.3 >>> output, mask, out_seed = torch_npu.npu_dropoutV2(input, seed, prob) >>> output tensor([0.4408, 0.4453, 0.2404, 0.9680], device='npu:0') >>> mask tensor([0., 0., 0., 0.], device='npu:0') >>> out_seed tensor([0.4408, 0.4453, 0.2404, 0.9680, 0.0999, 0.8665, 0.2993, 0.5787, 0.0251, 0.6783, 0.7411, 0.0670, 0.9430, 0.9165, 0.3983, 0.5849, 0.7722, 0.4659, 0.0486, 0.2693, 0.6451, 0.2734, 0.3176, 0.0176, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000], device='npu:0')
torch_npu._npu_dropout(self, p) -> (Tensor, Tensor)
Count dropout result without seed.
- Parameters:Similar to torch.dropout, optimize implemention to the npu device.
- self (Tensor) - the input tensor
- p (Float) - dropout probability
- Constraints:
- Examples:
>>> input = torch.tensor([1.,2.,3.,4.]).npu() >>> input tensor([1., 2., 3., 4.], device='npu:0') >>> prob = 0.3>>> output, mask = torch_npu._npu_dropout(input, prob) >>> output tensor([0.0000, 2.8571, 0.0000, 0.0000], device='npu:0') >>> mask tensor([ 98, 255, 188, 186, 120, 157, 175, 159, 77, 223, 127, 79, 247, 151, 253, 255], device='npu:0', dtype=torch.uint8)
torch_npu._npu_dropout_inplace(result, p) -> (Tensor(a!), Tensor)
- Parameters:Similar to torch.dropout_, optimize implemention to the npu device.
- result (Tensor) - the tensor dropout inplace
- p (Float) - dropout probability
- Constraints:
- Examples:
>>> input = torch.tensor([1.,2.,3.,4.]).npu() >>> input tensor([1., 2., 3., 4.], device='npu:0') >>> prob = 0.3>>> output, mask = torch_npu._npu_dropout_inplace(input, prob) >>> output tensor([0.0000, 2.8571, 0.0000, 0.0000], device='npu:0') >>> inputtensor([0.0000, 2.8571, 4.2857, 5.7143], device='npu:0') >>> mask tensor([ 98, 255, 188, 186, 120, 157, 175, 159, 77, 223, 127, 79, 247, 151, 253, 255], device='npu:0', dtype=torch.uint8)
torch_npu.npu_indexing(self, begin, end, strides, begin_mask=0, end_mask=0, ellipsis_mask=0, new_axis_mask=0, shrink_axis_mask=0) -> Tensor
- Parameters:
- self (Tensor) - an input tensor
- begin (ListInt) - the index of the first value to select
- end (ListInt) - the index of the last value to select
- strides (ListInt) - the index increment
- begin_mask (Int) - A bitmask where a bit "i" being "1" means to ignore the begin value and instead use the largest interval possible.
- end_mask (Int) - analogous to "begin_mask"
- ellipsis_mask (Int) - A bitmask where bit "i" being "1" means the "i"th position is actually an ellipsis.
- new_axis_mask (Int) - A bitmask where bit "i" being "1" means the "i"th specification creates a new shape 1 dimension.
- shrink_axis_mask (Int) - A bitmask where bit "i" implies that the "i"th specification should shrink the dimensionality.
- Constraints:
- Examples:
>>> input = torch.tensor([[1, 2, 3, 4],[5, 6, 7, 8]], dtype=torch.int32).to("npu") >>> input tensor([[1, 2, 3, 4], [5, 6, 7, 8]], device='npu:0', dtype=torch.int32) >>> output = torch_npu.npu_indexing(input1, [0, 0], [2, 2], [1, 1]) >>> output tensor([[1, 2], [5, 6]], device='npu:0', dtype=torch.int32)
torch_npu.npu_ifmr(Tensor data, Tensor data_min, Tensor data_max, Tensor cumsum, float min_percentile, float max_percentile, float search_start, float search_end, float search_step, bool with_offset) -> (Tensor, Tensor)
Count ifmr result by begin,end,strides array, Input Feature Map Reconstruction.
- Parameters:
- data (Tensor) - a tensor of feature map
- data_min (Tensor) - a tensor of min value of feature map
- data_max (Tensor) - a tensor of max value of feature map
- cumsum (Tensor) - a tensor of cumsum bin of data
- min_percentile (Float) - min init percentile
- max_percentile (Float) - max init percentile
- search_start (Float) - search start
- search_end (Float) - search end
- search_step (Float) - step size of searching
- with_offset (Bool) - whether use offset
- Returns:
- scale - optimal scale
- offset - optimal offset
- Constraints:
- Examples:
>>> input = torch.rand((2,2,3,4),dtype=torch.float32).npu() >>> input tensor([[[[0.4508, 0.6513, 0.4734, 0.1924], [0.0402, 0.5502, 0.0694, 0.9032], [0.4844, 0.5361, 0.9369, 0.7874]], [[0.5157, 0.1863, 0.4574, 0.8033], [0.5986, 0.8090, 0.7605, 0.8252], [0.4264, 0.8952, 0.2279, 0.9746]]], [[[0.0803, 0.7114, 0.8773, 0.2341], [0.6497, 0.0423, 0.8407, 0.9515], [0.1821, 0.5931, 0.7160, 0.4968]], [[0.7977, 0.0899, 0.9572, 0.0146], [0.2804, 0.8569, 0.2292, 0.1118], [0.5747, 0.4064, 0.8370, 0.1611]]]], device='npu:0') >>> min_value = torch.min(input) >>> min_value tensor(0.0146, device='npu:0') >>> max_value = torch.max(input) >>> max_value tensor(0.9746, device='npu:0') >>> hist = torch.histc(input.to('cpu'), bins=128, min=min_value.to('cpu'), max=max_value.to('cpu')) >>> hist tensor([1., 0., 0., 2., 0., 0., 0., 1., 1., 0., 1., 0., 1., 0., 0., 0., 0., 0., 0., 1., 0., 0., 2., 1., 0., 0., 0., 0., 2., 1., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 1., 0., 0., 0., 1., 1., 0., 1., 1., 0., 1., 0., 1., 0., 0., 1., 0., 1., 0., 0., 1., 0., 0., 2., 0., 0., 0., 0., 0., 0., 2., 0., 0., 0., 0., 0., 0., 0., 1., 1., 0., 0., 0., 0., 0., 1., 0., 0., 0., 1., 1., 2., 0., 0., 1., 1., 1., 0., 1., 0., 0., 1., 0., 1., 1., 0., 0., 0., 1., 0., 1., 1., 0., 1.]) >>> cdf = torch.cumsum(hist,dim=0).int().npu() >>> cdf tensor([ 1, 1, 1, 3, 3, 3, 3, 4, 5, 5, 6, 6, 7, 7, 7, 7, 7, 7, 7, 8, 8, 8, 10, 11, 11, 11, 11, 11, 13, 14, 14, 14, 14, 14, 14, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 16, 16, 17, 17, 17, 17, 18, 19, 19, 20, 21, 21, 22, 22, 23, 23, 23, 24, 24, 25, 25, 25, 26, 26, 26, 28, 28, 28, 28, 28, 28, 28, 30, 30, 30, 30, 30, 30, 30, 30, 31, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 34, 35, 37, 37, 37, 38, 39, 40, 40, 41, 41, 41, 42, 42, 43, 44, 44, 44, 44, 45, 45, 46, 47, 47, 48], device='npu:0', dtype=torch.int32) >>> scale, offset = torch_npu.npu_ifmr(input, min_value, max_value, cdf, min_percentile=0.999999, max_percentile=0.999999, search_start=0.7, search_end=1.3, search_step=0.01, with_offset=False) >>> scale tensor(0.0080, device='npu:0') >>> offset tensor(0., device='npu:0')
torch_npu.npu_max(self, dim, keepdim=False) -> (Tensor, Tensor)
Count max result with dim.
- Parameters:Similar to torch.max, optimize implemention to the npu device.
- self (Tensor) – the input tensor
- dim (Int) – the dimension to reduce
- keepdim (Bool) – whether the output tensor has dim retained or not
- Returns:
- values - max values in the input tensor
- indices - index of max values in the input tensor
- Constraints:
- Examples:
>>> input = torch.randn(2, 2, 2, 2, dtype = torch.float32).npu() >>> input tensor([[[[-1.8135, 0.2078], [-0.6678, 0.7846]], [[ 0.6458, -0.0923], [-0.2124, -1.9112]]], [[[-0.5800, -0.4979], [ 0.2580, 1.1335]], [[ 0.6669, 0.1876], [ 0.1160, -0.1061]]]], device='npu:0') >>> outputs, indices = torch_npu.npu_max(input, 2) >>> outputs tensor([[[-0.6678, 0.7846], [ 0.6458, -0.0923]], [[ 0.2580, 1.1335], [ 0.6669, 0.1876]]], device='npu:0') >>> indices tensor([[[1, 1], [0, 0]], [[1, 1], [0, 0]]], device='npu:0', dtype=torch.int32)
torch_npu.npu_min(self, dim, keepdim=False) -> (Tensor values, Tensor indices)
- Parameters:Similar to torch.min, optimize implemention to npu device.
- self (Tensor) – the input tensor
- dim (Int) – the dimension to reduce
- keepdim (Bool) – whether the output tensor has dim retained or not
- Returns:
- values - min values in the input tensor
- indices - index of min values in the input tensor
- Constraints:
- Examples:
>>> input = torch.randn(2, 2, 2, 2, dtype = torch.float32).npu() >>> input tensor([[[[-0.9909, -0.2369], [-0.9569, -0.6223]], [[ 0.1157, -0.3147], [-0.7761, 0.1344]]], [[[ 1.6292, 0.5953], [ 0.6940, -0.6367]], [[-1.2335, 0.2131], [ 1.0748, -0.7046]]]], device='npu:0') >>> outputs, indices = torch_npu.npu_min(input, 2) >>> outputs tensor([[[-0.9909, -0.6223], [-0.7761, -0.3147]], [[ 0.6940, -0.6367], [-1.2335, -0.7046]]], device='npu:0') >>> indices tensor([[[0, 1], [1, 0]], [[1, 1], [0, 1]]], device='npu:0', dtype=torch.int32)
torch_npu.npu_scatter(self, indices, updates, dim) -> Tensor
Count scatter result with dim.
- Parameters:Similar to torch.scatter, optimize implemention to the npu device.
- self (Tensor) - the input tensor
- indices (Tensor) – The indices of elements to scatter, can be either empty or of the same dimensionality as src. When empty, the operation returns self unchanged.
- updates (Tensor) – the source element(s) to scatter
- dim (Int) – the axis along which to index
- Constraints:
- Examples:
>>> input = torch.tensor([[1.6279, 0.1226], [0.9041, 1.0980]]).npu() >>> input tensor([[1.6279, 0.1226], [0.9041, 1.0980]], device='npu:0') >>> indices = torch.tensor([0, 1],dtype=torch.int32).npu() >>> indices tensor([0, 1], device='npu:0', dtype=torch.int32) >>> updates = torch.tensor([-1.1993, -1.5247]).npu() >>> updates tensor([-1.1993, -1.5247], device='npu:0') >>> dim = 0 >>> output = torch_npu.npu_scatter(input, indices, updates, dim) >>> output tensor([[-1.1993, 0.1226], [ 0.9041, -1.5247]], device='npu:0')
torch_npu.npu_layer_norm_eval(input, normalized_shape, weight=None, bias=None, eps=1e-05) -> Tensor
- Parameters:The same as torch.nn.functional.layer_norm, optimize implemention to the npu device.
- input (Tensor) - the input tensor
- normalized_shape (ListInt) – input shape from an expected input of size
- weight (Tensor) - the gamma tensor
- bias (Tensor) - the beta tensor
- eps (Float) – The epsilon value added to the denominator for numerical stability. Default: 1e-5.
- Constraints:
- Examples:
>>> input = torch.rand((6, 4), dtype=torch.float32).npu() >>> input tensor([[0.1863, 0.3755, 0.1115, 0.7308], [0.6004, 0.6832, 0.8951, 0.2087], [0.8548, 0.0176, 0.8498, 0.3703], [0.5609, 0.0114, 0.5021, 0.1242], [0.3966, 0.3022, 0.2323, 0.3914], [0.1554, 0.0149, 0.1718, 0.4972]], device='npu:0') >>> normalized_shape = input.size()[1:] >>> normalized_shape torch.Size([4]) >>> weight = torch.Tensor(*normalized_shape).npu() >>> weight tensor([ nan, 6.1223e-41, -8.3159e-20, 9.1834e-41], device='npu:0') >>> bias = torch.Tensor(*normalized_shape).npu() >>> bias tensor([5.6033e-39, 6.1224e-41, 6.1757e-39, 6.1224e-41], device='npu:0') >>> output = torch_npu.npu_layer_norm_eval(input, normalized_shape, weight, bias, 1e-5) >>> output tensor([[ nan, 6.7474e-41, 8.3182e-20, 2.0687e-40], [ nan, 8.2494e-41, -9.9784e-20, -8.2186e-41], [ nan, -2.6695e-41, -7.7173e-20, 2.1353e-41], [ nan, -1.3497e-41, -7.1281e-20, -6.9827e-42], [ nan, 3.5663e-41, 1.2002e-19, 1.4314e-40], [ nan, -6.2792e-42, 1.7902e-20, 2.1050e-40]], device='npu:0')
torch_npu.npu_alloc_float_status(self) -> Tensor
Produce eight numbers with a value of zero.
- Parameters:
- self (Tensor) - any tensor
- Constraints:
- Examples:
>>> input = torch.randn([1,2,3]).npu() >>> output = torch_npu.npu_alloc_float_status(input) >>> input tensor([[[ 2.2324, 0.2478, -0.1056], [ 1.1273, -0.2573, 1.0558]]], device='npu:0') >>> output tensor([0., 0., 0., 0., 0., 0., 0., 0.], device='npu:0')
torch_npu.npu_get_float_status(self) -> Tensor
Compute npu_get_float_status operator function.
- Parameters:
- self (Tensor) - A tensor of data memory address. Must be float32.
- Constraints:
- Examples:
>>> x = torch.rand(2).npu() >>> torch_npu.npu_get_float_status(x) tensor([0., 0., 0., 0., 0., 0., 0., 0.], device='npu:0')
torch_npu.npu_clear_float_status(self) -> Tensor
Set the value of address 0x40000 to 0 in each core.
- Parameters:
- self (Tensor) - A tensor of type float32.
- Constraints:
- Examples:
>>> x = torch.rand(2).npu() >>> torch_npu.npu_clear_float_status(x) tensor([0., 0., 0., 0., 0., 0., 0., 0.], device='npu:0')
torch_npu.npu_confusion_transpose(self, perm, shape, transpose_first) -> Tensor
Confuse reshape and transpose.
- Parameters:
- self (Tensor) - A tensor. Must be one of the following types: float16, float32, int8, int16, int32, int64, uint8, uint16, uint32, uint64.
- perm (ListInt) - a permutation of the dimensions of "x"
- shape (ListInt) - The shape of the input.
- transpose_first (Bool) - If True, the transpose is done first, otherwise the reshape is done first.
- Constraints:
- Examples:
>>> x = torch.rand(2, 3, 4, 6).npu() >>> x.shape torch.Size([2, 3, 4, 6]) >>> y = torch_npu.npu_confusion_transpose(x, (0, 2, 1, 3), (2, 4, 18), True) >>> y.shape torch.Size([2, 4, 18]) >>> y2 = torch_npu.npu_confusion_transpose(x, (0, 2, 1), (2, 12, 6), False) >>> y2.shape torch.Size([2, 6, 12])
torch_npu.npu_bmmV2(self, mat2, output_sizes) -> Tensor
Multiply matrix "a" by matrix "b", producing "a * b" .
- Parameters:
- self (Tensor) - A matrix tensor. Must be one of the following types: float16, float32, int32. 2D or higher. Has the format [ND, NHWC, FRACTAL_NZ].
- mat2 (Tensor) - A matrix tensor. Must be one of the following types: float16, float32, int32. 2D or higher. Has the format [ND, NHWC, FRACTAL_NZ].
- output_sizes (ListInt) - Output's shape, used in matmul's backpropagation. Default: [].
- Constraints:
- Examples:
>>> mat1 = torch.randn(10, 3, 4).npu() >>> mat2 = torch.randn(10, 4, 5).npu() >>> res = torch_npu.npu_bmmV2(mat1, mat2, []) >>> res.shape torch.Size([10, 3, 5])
torch_npu.fast_gelu(self) -> Tensor
Compute the gradient for the fast_gelu of "x" .
- Parameters:
- self (Tensor) - A tensor. Must be one of the following types: float16, float32.
- Constraints:
- Examples:
>>> x = torch.rand(2).npu() >>> x tensor([0.5991, 0.4094], device='npu:0') >>> torch_npu.fast_gelu(x) tensor([0.4403, 0.2733], device='npu:0')
torch_npu.npu_deformable_conv2d(self, weight, offset, bias, kernel_size, stride, padding, dilation=[1,1,1,1], groups=1, deformable_groups=1, modulated=True) -> (Tensor, Tensor)
- Parameters:
- self (Tensor) - A 4D tensor of input image. With the format "NHWC", the data is stored in the order of: [batch, in_height, in_width, in_channels].
- weight (Tensor) - A 4D tensor of learnable filters. Must have the same type as "x". With the format "HWCN" , the data is stored in the order of: [filter_height, filter_width, in_channels / groups, out_channels].
- offset (Tensor) - A 4D tensor of x-y coordinates offset and mask. With the format "NHWC", the data is stored in the order of: [batch, out_height, out_width, deformable_groups * filter_height * filter_width * 3].
- bias (Tensor) - An optional 1D tensor of additive biases to the filter outputs. The data is stored in the order of: [out_channels].
- kernel_size (ListInt of length 2) - A tuple/list of 2 integers.kernel size.
- stride (ListInt) - Required. A list of 4 integers. The stride of the sliding window for each dimension of input. The dimension order is interpreted according to the data format of "x". The N and C dimensions must be set to 1.
- padding (ListInt) - Required. A list of 4 integers. The number of pixels to add to each (top, bottom, left, right) side of the input.
- dilations (ListInt) - Optional. A list of 4 integers. The dilation factor for each dimension of input. The dimension order is interpreted according to the data format of "x". The N and C dimensions must be set to 1. Default: [1, 1, 1, 1].
- groups (Int) - Optional. An integer of type int32. The number of blocked connections from input channels to output channels. In_channels and out_channels must both be divisible by "groups". Default: 1.
- deformable_groups (Int) - Optional. An integer of type int32. The number of deformable group partitions. In_channels must be divisible by "deformable_groups". Defaults to 1.
- modulated (Bool) - Optional. Specify the version of DeformableConv2D, True means v2, False means v1, currently only support v2.
- Constraints:
- Examples:
>>> x = torch.rand(16, 32, 32, 32).npu() >>> weight = torch.rand(32, 32, 5, 5).npu() >>> offset = torch.rand(16, 75, 32, 32).npu() >>> output, _ = torch_npu.npu_deformable_conv2d(x, weight, offset, None, kernel_size=[5, 5], stride = [1, 1, 1, 1], padding = [2, 2, 2, 2]) >>> output.shape torch.Size([16, 32, 32, 32])
torch_npu.npu_mish(self) -> Tensor
Compute hyperbolic tangent of "x" element-wise.
- Parameters:
- self (Tensor) - A tensor. Must be one of the following types: float16, float32.
- Constraints:
- Examples:
>>> x = torch.rand(10, 30, 10).npu() >>> y = torch_npu.npu_mish(x) >>> y.shape torch.Size([10, 30, 10])
torch_npu.npu_anchor_response_flags(self, featmap_size, stride, num_base_anchors) -> Tensor
Generate the responsible flags of anchor in a single feature map.
- Parameters:
- self (Tensor) - ground truth box, 2D tensor with shape [batch, 4]
- featmap_size (ListInt of length 2) - the size of feature maps
- strides (ListInt of length 2) - stride of current level
- num_base_anchors (Int) - the number of base anchors
- Constraints:
- Examples:
>>> x = torch.rand(100, 4).npu() >>> y = torch_npu.npu_anchor_response_flags(x, [60, 60], [2, 2], 9) >>> y.shape torch.Size([32400])
torch_npu.npu_yolo_boxes_encode(self, gt_bboxes, stride, performance_mode=False) -> Tensor
Generate bounding boxes based on yolo's "anchor" and "ground-truth" boxes. It is a customized mmdetection operator.
- Parameters:
- self (Tensor) - Anchor boxes generated by the yolo training set. A 2D tensor of type float32 or float16 with shape (N, 4). "N" indicates the number of ROIs, and the value "4" refers to (tx, ty, tw, th).
- gt_bboxes (Tensor) - Target of the transformation, e.g ground-truth boxes. A 2D tensor of type float32 or float16 with shape (N, 4). "N" indicates the number of ROIs, and the value "4" refers to "dx", "dy", "dw", and "dh".
- strides (Tensor) - Scale for each box. An 1D tensor of type int32 shape (N,). "N" indicates the number of ROIs.
- performance_mode (Bool) - Select performance mode, "high_precision" or "high_performance". If it is True, the performance mode is "high_performance"; if it is False, the performance mode is "high_precision". Select "high_precision" when input type is float32, the output tensor precision will be smaller than 0.0001. Select "high_performance" when input type is float32, the ops will be best performance, but precision will be only smaller than 0.005.
- Constraints:
- Examples:
>>> anchor_boxes = torch.rand(2, 4).npu() >>> gt_bboxes = torch.rand(2, 4).npu() >>> stride = torch.tensor([2, 2], dtype=torch.int32).npu() >>> output = torch_npu.npu_yolo_boxes_encode(anchor_boxes, gt_bboxes, stride, False) >>> output.shape torch.Size([2, 4])
torch_npu.npu_grid_assign_positive(self, overlaps, box_responsible_flags, max_overlaps, argmax_overlaps, gt_max_overlaps, gt_argmax_overlaps, num_gts, pos_iou_thr, min_pos_iou, gt_max_assign_all) -> Tensor
Perform Position Sensitive PS ROI Pooling Grad.
- Parameters:
- self (Tensor) - tensor of type float16 or float32, shape (n, )
- overlaps (Tensor) - A tensor. Datatype is same as assigned_gt_inds. IOU between gt_bboxes and bboxes. shape(k, n)
- box_responsible_flags (Tensor) - A tensor. Support uint8. Flag to indicate whether box is responsible.
- max_overlaps (Tensor) - A tensor. Datatype is the same as assigned_gt_inds. overlaps.max(axis=0).
- argmax_overlaps (Tensor) - A tensor. Support int32. overlaps.argmax(axis=0).
- gt_max_overlaps (Tensor) - A tensor. Datatype is same as assigned_gt_inds. overlaps.max(axis=1).
- gt_argmax_overlaps (Tensor) - A tensor. Support int32. overlaps.argmax(axis=1).
- num_gts (Tensor) - A tensor. Support int32. real k. shape (1, )
- pos_iou_thr (Float) - IOU threshold for positive bboxes
- min_pos_iou (Float) - Minimum IOU for a bbox to be considered as a positive bbox
- gt_max_assign_all (Bool) - Whether to assign all bboxes with the same highest overlap with some gt to that gt.
- Constraints:
- Examples:
>>> assigned_gt_inds = torch.rand(4).npu() >>> overlaps = torch.rand(2,4).npu() >>> box_responsible_flags = torch.tensor([1, 1, 1, 0], dtype=torch.uint8).npu() >>> max_overlap = torch.rand(4).npu() >>> argmax_overlap = torch.tensor([1, 0, 1, 0], dtype=torch.int32).npu() >>> gt_max_overlaps = torch.rand(2).npu() >>> gt_argmax_overlaps = torch.tensor([1, 0],dtype=torch.int32).npu() >>> output = torch_npu.npu_grid_assign_positive(assigned_gt_inds, overlaps, box_responsible_flags, max_overlap, argmax_overlap, gt_max_overlaps, gt_argmax_overlaps, 128, 0.5, 0., True) >>> output.shape torch.Size([4])
torch_npu.npu_normalize_batch(self, seq_len, normalize_type=0) -> Tensor
Perform batch normalization .
- Parameters:
- self (Tensor) - A tensor. Support float32. shape (n, c, d).
- seq_len (Tensor) - A tensor. Each batch normalize data num. Support Int32. Shape (n, ).
- normalize_type (Int) - Support "per_feature" or "all_features". 0 means "per_feature"; 1 means "all_features".
- Constraints:
- Examples:
>>> a=np.random.uniform(1,10,(2,3,6)).astype(np.float32) >>> b=np.random.uniform(3,6,(2)).astype(np.int32) >>> x=torch.from_numpy(a).to("npu") >>> seqlen=torch.from_numpy(b).to("npu") >>> out = torch_npu.npu_normalize_batch(x, seqlen, 0) >>> out tensor([[[ 1.1496, -0.6685, -0.4812, 1.7611, -0.5187, 0.7571], [ 1.1445, -0.4393, -0.7051, 1.0474, -0.2646, -0.1582], [ 0.1477, 0.9179, -1.0656, -6.8692, -6.7437, 2.8621]], [[-0.6880, 0.1337, 1.3623, -0.8081, -1.2291, -0.9410], [ 0.3070, 0.5489, -1.4858, 0.6300, 0.6428, 0.0433], [-0.5387, 0.8204, -1.1401, 0.8584, -0.3686, 0.8444]]], device='npu:0')
torch_npu.npu_masked_fill_range(self, start, end, value, axis=-1) -> Tensor
Masked fill tensor along with one axis by range.boxes. It is a customized masked fill range operator .
- Parameters:
- self (Tensor) - input tensor. An ND tensor of float32/float16/int32/int8 with shapes 1D (D,), 2D(N, D), 3D(N, C, D).
- start (Tensor) - masked fill start pos. A 3D tensor of int32 with shape (num, N).
- end (Tensor) - masked fill end pos. A 3D tensor of int32 with shape (num, N).
- value (Tensor) - masked fill value. A 2D tensor of float32/float16/int32/int8 with shape (num,).
- axis (Int) - axis with masked fill of int32. Default: -1.
- Constraints:
- Examples:
>>> a=torch.rand(4,4).npu() >>> a tensor([[0.9419, 0.4919, 0.2874, 0.6560], [0.6691, 0.6668, 0.0330, 0.1006], [0.3888, 0.7011, 0.7141, 0.7878], [0.0366, 0.9738, 0.4689, 0.0979]], device='npu:0') >>> start = torch.tensor([[0,1,2]], dtype=torch.int32).npu() >>> end = torch.tensor([[1,2,3]], dtype=torch.int32).npu() >>> value = torch.tensor([1], dtype=torch.float).npu() >>> out = torch_npu.npu_masked_fill_range(a, start, end, value, 1) >>> out tensor([[1.0000, 0.4919, 0.2874, 0.6560], [0.6691, 1.0000, 0.0330, 0.1006], [0.3888, 0.7011, 1.0000, 0.7878], [0.0366, 0.9738, 0.4689, 0.0979]], device='npu:0')
torch_npu.npu_linear(input, weight, bias=None) -> Tensor
Multiply matrix "a" by matrix "b", producing "a * b" .
- Parameters:
- input (Tensor) - A matrix tensor. 2D. Must be one of the following types: float32, float16, int32, int8. Has the format [ND, NHWC, FRACTAL_NZ].
- weight (Tensor) - A matrix tensor. 2D. Must be one of the following types: float32, float16, int32, int8. Has the format [ND, NHWC, FRACTAL_NZ].
- bias (Tensor) - An 1D Tensor. Must be one of the following types: float32, float16, int32. Has format [ND, NHWC].
- Constraints:
- Examples:
>>> x=torch.rand(2,16).npu() >>> w=torch.rand(4,16).npu() >>> b=torch.rand(4).npu() >>> output = torch_npu.npu_linear(x, w, b) >>> output tensor([[3.6335, 4.3713, 2.4440, 2.0081], [5.3273, 6.3089, 3.9601, 3.2410]], device='npu:0')
torch_npu.npu_bert_apply_adam(lr, beta1, beta2, epsilon, grad, max_grad_norm, global_grad_norm, weight_decay, step_size=None, adam_mode=0, *, out=(var,m,v))
Count adam result.
- Parameters:
- var (Tensor) - A tensor. Support float16/float32.
- m (Tensor) - A tensor. Datatype and shape are the same as exp_avg.
- v (Tensor) - A tensor. Datatype and shape are the same as exp_avg.
- lr (Scalar) - Datatype is the same as exp_avg.
- beta1 (Scalar) - Datatype is the same as exp_avg.
- beta2 (Scalar) - Datatype is the same as exp_avg.
- epsilon (Scalar) - Datatype is the same as exp_avg.
- grad (Tensor) - A tensor. Datatype and shape are the same as exp_avg.
- max_grad_norm (Scalar) - Datatype is the same as exp_avg.
- global_grad_norm (Scalar) - Datatype is the same as exp_avg.
- weight_decay (Scalar) - Datatype is the same as exp_avg.
- Keyword Arguments :
- out :A tensor, optional. The output tensor.
- Constraints:
- Examples:
>>> var_in = torch.rand(321538).uniform_(-32., 21.).npu() >>> m_in = torch.zeros(321538).npu() >>> v_in = torch.zeros(321538).npu() >>> grad = torch.rand(321538).uniform_(-0.05, 0.03).npu() >>> max_grad_norm = -1. >>> beta1 = 0.9 >>> beta2 = 0.99 >>> weight_decay = 0. >>> lr = 0. >>> epsilon = 1e-06 >>> global_grad_norm = 0. >>> var_out, m_out, v_out = torch_npu.npu_bert_apply_adam(lr, beta1, beta2, epsilon, grad, max_grad_norm, global_grad_norm, weight_decay, out=(var_in, m_in, v_in)) >>> var_out tensor([ 14.7733, -30.1218, -1.3647, ..., -16.6840, 7.1518, 8.4872], device='npu:0')
torch_npu.npu_giou(self, gtboxes, trans=False, is_cross=False, mode=0) -> Tensor
First calculate the minimum closure area of the two boxes, IoU, then the proportion of the closed area that does not belong to the two boxes in the closure area, and finally subtract this proportion from IoU to get GIoU .
- Parameters:
- self (Tensor) - Bounding boxes, a 2D tensor of type float16 or float32 with shape (N, 4). "N" indicates the number of bounding boxes, and the value "4" refers to [x1, y1, x2, y2] or [x, y, w, h].
- gtboxes (Tensor) - Ground-truth boxes, a 2D tensor of type float16 or float32 with shape (M, 4). "M" indicates the number of ground truth boxes, and the value "4" refers to [x1, y1, x2, y2] or [x, y, w, h].
- trans (Bool) - An optional bool, True for 'xywh', False for 'xyxy'.
- is_cross (Bool) - An optional bool, control whether the output shape is [M, N] or [1, N]. If it is True, the output shape is [M,N]. If it is False, the output shape is [1,N].
- mode (Number) - Computation mode with the value of 0 or 1. 0 means iou, 1 means iof.
- Constraints:
- Examples:
>>> a=np.random.uniform(0,1,(4,10)).astype(np.float16) >>> b=np.random.uniform(0,1,(4,10)).astype(np.float16) >>> box1=torch.from_numpy(a).to("npu") >>> box2=torch.from_numpy(a).to("npu") >>> output = torch_npu.npu_giou(box1, box2, trans=True, is_cross=False, mode=0) >>> output tensor([[1.], [1.], [1.], [1.], [1.], [1.], [1.], [1.], [1.], [1.]], device='npu:0', dtype=torch.float16)
torch_npu.npu_silu(self) -> Tensor
Compute the Swish of "x" .
- Parameters:
- self (Tensor) - A Tensor. Must be one of the following types: float16, float32.
- Constraints:
- Examples:
>>> a=torch.rand(2,8).npu() >>> output = torch_npu.npu_silu(a) >>> output tensor([[0.4397, 0.7178, 0.5190, 0.2654, 0.2230, 0.2674, 0.6051, 0.3522], [0.4679, 0.1764, 0.6650, 0.3175, 0.0530, 0.4787, 0.5621, 0.4026]], device='npu:0')
torch_npu.npu_reshape(self, shape, bool can_refresh=False) -> Tensor
Reshape a tensor. Only the tensor shape is changed and its data is not changed.
- Parameters:
- self (Tensor) - A Tensor.
- shape (ListInt) - Define the shape of the output tensor.
- can_refresh (Bool) - Use to specify whether reshape can be refreshed in place.
- Constraints:
This operator cannot be directly called by the acllopExecute API.
- Examples:
>>> a=torch.rand(2,8).npu() >>> out=torch_npu.npu_reshape(a,(4,4)) >>> out tensor([[0.6657, 0.9857, 0.7614, 0.4368], [0.3761, 0.4397, 0.8609, 0.5544], [0.7002, 0.3063, 0.9279, 0.5085], [0.1009, 0.7133, 0.8118, 0.6193]], device='npu:0')
torch_npu.npu_rotated_overlaps(self, query_boxes, trans=False) -> Tensor
Calculate the overlapping area of the rotated box.
- Parameters:
- self (Tensor) - Data of grad increment, a 3D Tensor of type float32 with shape (B, 5, N).
- query_boxes (Tensor) - Bounding boxes, a 3D Tensor of type float32 with shape (B, 5, K).
- trans (Bool) - An optional attr, True for 'xyxyt', False for 'xywht'.
- Constraints:
- Examples:
>>> a=np.random.uniform(0,1,(1,3,5)).astype(np.float16) >>> b=np.random.uniform(0,1,(1,2,5)).astype(np.float16) >>> box1=torch.from_numpy(a).to("npu") >>> box2=torch.from_numpy(a).to("npu") >>> output = torch_npu.npu_rotated_overlaps(box1, box2, trans=False) >>> output tensor([[[0.0000, 0.1562, 0.0000], [0.1562, 0.3713, 0.0611], [0.0000, 0.0611, 0.0000]]], device='npu:0', dtype=torch.float16)
torch_npu.npu_rotated_iou(self, query_boxes, trans=False, mode=0, is_cross=True) -> Tensor
- Parameters:
- self (Tensor) - Data of grad increment, a 3D Tensor of type float32 with shape (B, 5, N).
- query_boxes (Tensor) - Bounding boxes, a 3D Tensor of type float32 with shape (B, 5, K).
- trans (Bool) - An optional attr, True for 'xyxyt', False for 'xywht'.
- is_cross (Bool) -Cross calculation when it is True, and one-to-one calculation when it is False.
- mode (Int) - Computation mode with the value of 0 or 1. 0 means iou, 1 means iof.
- Constraints:
- Examples:
>>> a=np.random.uniform(0,1,(2,2,5)).astype(np.float16) >>> b=np.random.uniform(0,1,(2,3,5)).astype(np.float16) >>> box1=torch.from_numpy(a).to("npu") >>> box2=torch.from_numpy(a).to("npu") >>> output = torch_npu.npu_rotated_iou(box1, box2, trans=False, mode=0, is_cross=True) >>> output tensor([[[3.3325e-01, 1.0162e-01], [1.0162e-01, 1.0000e+00]], [[0.0000e+00, 0.0000e+00], [0.0000e+00, 5.9605e-08]]], device='npu:0', dtype=torch.float16)
torch_npu.npu_rotated_box_encode(anchor_box, gt_bboxes, weight) -> Tensor
- Parameters:
- anchor_box (Tensor) - A 3D Tensor with shape (B, 5, N). the input tensor.Anchor boxes. "B" indicates the number of batch size. "N" indicates the number of bounding boxes, and the value "5" refers to "x0", "x1", "y0", "y1" and "angle" .
- gt_bboxes (Tensor) - A 3D Tensor of float32 (float16) with shape (B, 5, N).
- weight (Tensor) - A float list for "x0", "x1", "y0", "y1" and "angle". Default: [1.0, 1.0, 1.0, 1.0, 1.0].
- Constraints:
- Examples:
>>> anchor_boxes = torch.tensor([[[30.69], [32.6], [45.94], [59.88], [-44.53]]], dtype=torch.float16).to("npu") >>> gt_bboxes = torch.tensor([[[30.44], [18.72], [33.22], [45.56], [8.5]]], dtype=torch.float16).to("npu") >>> weight = torch.tensor([1., 1., 1., 1., 1.], dtype=torch.float16).npu() >>> out = torch_npu.npu_rotated_box_encode(anchor_boxes, gt_bboxes, weight) >>> out tensor([[[-0.4253], [-0.5166], [-1.7021], [-0.0162], [ 1.1328]]], device='npu:0', dtype=torch.float16)
torch_npu.npu_rotated_box_decode(anchor_boxes, deltas, weight) -> Tensor
Rotate Bounding Box Encoding.
- Parameters:
- anchor_box (Tensor) - A 3D Tensor with shape (B, 5, N). the input tensor.Anchor boxes. "B" indicates the number of batch size, "N" indicates the number of bounding boxes, and the value "5" refers to "x0", "x1", "y0", "y1" and "angle" .
- deltas (Tensor) - A 3D Tensor of float32 (float16) with shape (B, 5, N).
- weight (Tensor) - A float list for "x0", "x1", "y0", "y1" and "angle". Default: [1.0, 1.0, 1.0, 1.0, 1.0].
- Constraints:
- Examples:
>>> anchor_boxes = torch.tensor([[[4.137],[33.72],[29.4], [54.06], [41.28]]], dtype=torch.float16).to("npu") >>> deltas = torch.tensor([[[0.0244], [-1.992], [0.2109], [0.315], [-37.25]]], dtype=torch.float16).to("npu") >>> weight = torch.tensor([1., 1., 1., 1., 1.], dtype=torch.float16).npu() >>> out = torch_npu.npu_rotated_box_decode(anchor_boxes, deltas, weight) >>> out tensor([[[ 1.7861], [-10.5781], [ 33.0000], [ 17.2969], [-88.4375]]], device='npu:0', dtype=torch.float16)
torch_npu.npu_ciou(Tensor self, Tensor gtboxes, bool trans=False, bool is_cross=True, int mode=0, bool atan_sub_flag=False) -> Tensor
Apply an NPU based CIOU operation.
A penalty item is added on the basis of DIoU, and CIoU is proposed.
- Notes:
Until now, ciou backward only supports trans==True, is_cross==False, mode==0('iou') in the current version. If you need back propagation, please ensure your parameter is correct!
- Args:
- boxes1 (Tensor): Predicted bboxes of format xywh, shape (4, n).
- boxes2 (Tensor): Corresponding gt bboxes, shape (4, n).
- trans (Bool): Whether there is an offset
- is_cross (Bool): Whether there is a cross operation between box1 and box2.
- mode (Int): Select the calculation mode of diou.
- atan_sub_flag (Bool): Whether to pass the second value of the forward to the reverse.
- Returns:
- Examples:
>>> box1 = torch.randn(4, 32).npu() >>> box1.requires_grad = True >>> box2 = torch.randn(4, 32).npu() >>> box2.requires_grad = True >>> ciou = torch_npu.contrib.function.npu_ciou(box1, box2) >>> l = ciou.sum() >>> l.backward()
torch_npu.npu_diou(Tensor self, Tensor gtboxes, bool trans=False, bool is_cross=False, int mode=0) -> Tensor
Apply an NPU based DIOU operation.
Taking the distance between the targets,the overlap rate of the distance and the range into account. Different targets or boundaries will tend to be stable.
- Notes:
Until now, diou backward only supports trans==True, is_cross==False, mode==0('iou') in the current version. If you need back propagation, please ensure your parameter is correct!
- Args:
- boxes1 (Tensor): Predicted bboxes of format xywh, shape (4, n).
- boxes2 (Tensor): Corresponding gt bboxes, shape (4, n).
- trans (Bool): Whether there is an offset.
- is_cross (Bool): Whether there is a cross operation between box1 and box2.
- mode (Int): Select the calculation mode of diou.
- Returns:
- Examples:
>>> box1 = torch.randn(4, 32).npu() >>> box1.requires_grad = True >>> box2 = torch.randn(4, 32).npu() >>> box2.requires_grad = True >>> ciou = torch_npu.contrib.function.npu_diou(box1, box2) >>> l = diou.sum() >>> l.backward()
torch_npu.npu_sign_bits_pack(Tensor self, int size) -> Tensor
one-bit Adam pack of float into uint8.
- Args:
- x(Tensor) - A float Tensor in 1D.
- size(Int) - A required int. First dimension of output tensor when reshaping.
- Constraints:
Size needs to be divisible by output of packing floats. If size of x is divisible by 8, size of output is (size of x) / 8; otherwise, size of output is (size of x // 8) + 1, -1 float values will be added to fill divisibility, at little endian positions.
The AI Processors that support input type float32 and float16:
- 昇腾310P AI处理器
- 昇腾910 AI处理器
昇腾310 AI处理器 only supports input type float16.
- Examples:
>>>a = torch.tensor([5,4,3,2,0,-1,-2, 4,3,2,1,0,-1,-2],dtype=torch.float32).npu() >>>b = torch_npu.sign_bits_pack(a, 2) >>>b >>>tensor([[159],[15]], device='npu:0') >>>(binary form of 159 is ob10011111, corresponds to 4, -2, -1, 0, 2, 3, 4, 5 respectively)
torch_npu.sign_bits_unpack(x, dtype, size) -> Tensor
one-bit Adam unpack of uint8 into float.
- Args:
- x(Tensor) - A uint8 Tensor in 1D.
- dtype(Number) - A required int. 1 sets float16 as output, 0 sets float32 as output.
- size(Int) - A required int. First dimension of output tensor when reshaping.
- Constraints:
Size needs to be divisible by output of unpacking uint8s. Size of output is (size of x) * 8;
- Examples:
>>>a = torch.tensor([159, 15], dtype=torch.uint8).npu() >>>b = torch_npu.sign_bits_unpack(a, 0, 2) >>>b >>>tensor([[1., 1., 1., 1., 1., -1., -1., 1.], >>>[1., 1., 1., 1., -1., -1., -1., -1.]], device='npu:0') (binary form of 159 is ob00001111)