pooling3d

Description

Samples tensor_in in the area where the kernel convolves in different pooling modes.

The pooling mode can be MAX or AVG.

MAX: max pooling for 3D data. Outputs the maximum values of each patch of the feature map.
AVG: avg pooling for 3D data. Outputs the average values of each patch of the feature map.

When pooling_mode = MAX and padding_mode = SAME, the pooling result of tensor_in is as follows.

input_d = 4, input_h =4, input_w = 4

stride_d = 2, stride_h = 2, stride_w =2

kernel_d = 2, kernel_h = 2, kernel_w = 2

where

input_d: depth of tensor_in
input_h: height of tensor_in
input_w: width of tensor_in
kernel_d: depth of window
kernel_h: height of window
kernel_w: width of window
stride_d: depth of stride
stride_h: height of stride
stride_w: width of stride
pad_top: top padding along the D dimension of tensor_in, which is 0 in this example.
pad_bottom: bottom padding along the D dimension of tensor_in, which is 0 in this example.
pad_front: front padding along the H dimension of tensor_in, which is 0 in this example.
pad_back: back padding along the H dimension of tensor_in, which is 0 in this example.
pad_left: left padding along the W dimension of tensor_in, which is 0 in this example.
pad_right: right padding along the W dimension of tensor_in, which is 0 in this example.

This API supports basic pooling functions but does not support the output quantization function.

Prototype

pooling3d(tensor_in, window, stride, padding_mode="SAME", pads=(0, 0, 0, 0, 0, 0), pooling_mode="MAX", dilation=(1, 1, 1), ceil_mode=0)

Parameters

tensor_in: a tvm.tensor of type float16, for the feature map. Has a 6D format of NDC1HWC0.
window: a list or tuple for the sizes of the input slider.
window is a 3D list or tuple of positive integers within the range of [1, 8].

window[0] indicates the depth of the input window. window[1] indicates the width of the input window. window[2] indicates the height of the input window.
stride: a list or tuple for the strides of the input slider.
stride is a 3D list or tuple of positive integers. The width and height of stride is within the range of [1, 8].

stride[0] indicates the depth stride of the window for the feature map. stride[1] indicates the width stride of the window for the feature map. stride[2] indicates the height stride of the window for the feature map.
pooling_mode: the pooling mode, either MAX or AVG.
MAX: max pooling. Outputs the maximum values of each patch of the feature map.
AVG: avg pooling. Outputs the average values of each patch of the feature map.
pads: (optional) a list or tuple for the padding sizes, which is for the compatibility with Caffe pooling.
pads is a 6D list or tuple of integers whose values are greater than or equal to 0.

pads[0], pads[1], pads[2], pads[3], pads[4], and pads[5] indicate the padding size in the top, bottom, front, back, left, and right side, respectively. Defaults to (0,0,0,0,0,0).
padding_mode: padding mode, either VALID (padding disabled) or SAME (padding enabled).
- In VALID mode:
  When the window movement along the W or H direction can cover only some parts of the feature map, the data that does not cover a complete window is discarded. That is, the data in the feature map is not involved in the computation.
- In SAME mode:
  When the window movement along the W or H direction can cover only some parts of the feature map, pad 0 to ensure that a complete window can be covered. That is, the data in the feature map is involved in the computation.
dilation: (optional) a list or tuple for the dilation factors.
dilation[0] and dilation[1] indicate the dilation factors of the window in terms of height and width. Defaults to (1,1,1). Currently, this parameter is reserved and does not take effect.
ceil_mode: equivalent of round_mode in Caffe. 0 (default): ceiling; 1: floor.

Returns

res_tensor: a tvm.tensor for the result tensor. Has a 6D of NDC1HWC0.

The shape of tensor_in is [N, D, C1, H, W, C0=16]; the shape of window is [F, F]; and the shape of stride is [S, S].

In VALID mode and SAME mode of MAX pooling and AVG pooling, the shape of the output tensor is computed as follows:

In VALID mode:
- The N and C dimensions remain unchanged.
- The dimensions of Dout, Hout, and Wout are as follows:
  new_depth=new_height=new_width = CEIL(W-F+1/S)
In SAME mode:
- The N and C dimensions remain unchanged.
- The dimensions of Dout, Hout, and Wout are as follows:
  new_depth=new_height=new_width = CEIL(W/S)
  
  W is the input size; F is the filter size; S is the stride; and [] is the round-up sign.

Restrictions

This API cannot be used in conjunction with other TBE DSL APIs.

When pooling_mode is set to MAX or AVG, then tensor_in, pad, and window must meet the following conditions:

stride_d <= in_size_d + pad_top + pad_bottom – window_d
stride_h <= in_size_h + pad_front + pad_back – window_h
stride_w <= in_size_w + pad_left + pad_right – window_w

ub_size indicates the available size of Unified Buffer.
out_w indicates the width of the output tensor.
window_h indicates the height of window.
window_w indicates the width of window.
C0 indicates the size of C0 of tensor_in.
SIZE_OF_FP16 indicates the size of float16 data.

Applicability

Atlas 200/300/500 Inference Product

Atlas Training Series Product

Example

from tbe import tvm
from tbe import dsl
shape = (1, 416, 2, 416, 416, 16) 
input_dtype = "float16"
data = tvm.placeholder(shape, name="data", dtype=input_dtype) 
res = dsl.pooling3d(data, (3, 3, 3), (2, 2, 2), "AVG", "SAME")
# res.shape = (1, 208, 2, 208, 208, 16)

Parent topic: NN Compute APIs