pooling3d
Description
Samples tensor_in in the area where the kernel convolves in different pooling modes.
The pooling mode can be MAX or AVG.
- MAX: max pooling for 3D data. Outputs the maximum values of each patch of the feature map.
- AVG: avg pooling for 3D data. Outputs the average values of each patch of the feature map.
When pooling_mode = MAX and padding_mode = SAME, the pooling result of tensor_in is as follows.
input_d = 4, input_h =4, input_w = 4
stride_d = 2, stride_h = 2, stride_w =2
kernel_d = 2, kernel_h = 2, kernel_w = 2

where
- input_d: depth of tensor_in
- input_h: height of tensor_in
- input_w: width of tensor_in
- kernel_d: depth of window
- kernel_h: height of window
- kernel_w: width of window
- stride_d: depth of stride
- stride_h: height of stride
- stride_w: width of stride
- pad_top: top padding along the D dimension of tensor_in, which is 0 in this example.
- pad_bottom: bottom padding along the D dimension of tensor_in, which is 0 in this example.
- pad_front: front padding along the H dimension of tensor_in, which is 0 in this example.
- pad_back: back padding along the H dimension of tensor_in, which is 0 in this example.
- pad_left: left padding along the W dimension of tensor_in, which is 0 in this example.
- pad_right: right padding along the W dimension of tensor_in, which is 0 in this example.
This API supports basic pooling functions but does not support the output quantization function.
Prototype
pooling3d(tensor_in, window, stride, padding_mode="SAME", pads=(0, 0, 0, 0, 0, 0), pooling_mode="MAX", dilation=(1, 1, 1), ceil_mode=0)
Parameters
- tensor_in: a tvm.tensor of type float16, for the feature map. Has a 6D format of NDC1HWC0.
- window: a list or tuple for the sizes of the input slider.
window is a 3D list or tuple of positive integers within the range of [1, 8].
window[0] indicates the depth of the input window. window[1] indicates the width of the input window. window[2] indicates the height of the input window.
- stride: a list or tuple for the strides of the input slider.
stride is a 3D list or tuple of positive integers. The width and height of stride is within the range of [1, 8].
stride[0] indicates the depth stride of the window for the feature map. stride[1] indicates the width stride of the window for the feature map. stride[2] indicates the height stride of the window for the feature map.
- pooling_mode: the pooling mode, either MAX or AVG.
- MAX: max pooling. Outputs the maximum values of each patch of the feature map.
- AVG: avg pooling. Outputs the average values of each patch of the feature map.
- pads: (optional) a list or tuple for the padding sizes, which is for the compatibility with Caffe pooling.
pads is a 6D list or tuple of integers whose values are greater than or equal to 0.
pads[0], pads[1], pads[2], pads[3], pads[4], and pads[5] indicate the padding size in the top, bottom, front, back, left, and right side, respectively. Defaults to (0,0,0,0,0,0).
- padding_mode: padding mode, either VALID (padding disabled) or SAME (padding enabled).
- In VALID mode:
When the window movement along the W or H direction can cover only some parts of the feature map, the data that does not cover a complete window is discarded. That is, the data in the feature map is not involved in the computation.
- In SAME mode:
When the window movement along the W or H direction can cover only some parts of the feature map, pad 0 to ensure that a complete window can be covered. That is, the data in the feature map is involved in the computation.
- In VALID mode:
- dilation: (optional) a list or tuple for the dilation factors.
dilation[0] and dilation[1] indicate the dilation factors of the window in terms of height and width. Defaults to (1,1,1). Currently, this parameter is reserved and does not take effect.
- ceil_mode: equivalent of round_mode in Caffe. 0 (default): ceiling; 1: floor.
Returns
res_tensor: a tvm.tensor for the result tensor. Has a 6D of NDC1HWC0.
The shape of tensor_in is [N, D, C1, H, W, C0=16]; the shape of window is [F, F]; and the shape of stride is [S, S].
In VALID mode and SAME mode of MAX pooling and AVG pooling, the shape of the output tensor is computed as follows:
- In VALID mode:
- The N and C dimensions remain unchanged.
- The dimensions of Dout, Hout, and Wout are as follows:
new_depth=new_height=new_width = CEIL(W-F+1/S)
- In SAME mode:
- The N and C dimensions remain unchanged.
- The dimensions of Dout, Hout, and Wout are as follows:
new_depth=new_height=new_width = CEIL(W/S)
W is the input size; F is the filter size; S is the stride; and [] is the round-up sign.
Restrictions
This API cannot be used in conjunction with other TBE DSL APIs.
When pooling_mode is set to MAX or AVG, then tensor_in, pad, and window must meet the following conditions:
stride_d <= in_size_d + pad_top + pad_bottom – window_d stride_h <= in_size_h + pad_front + pad_back – window_h stride_w <= in_size_w + pad_left + pad_right – window_w
- ub_size indicates the available size of Unified Buffer.
- out_w indicates the width of the output tensor.
- window_h indicates the height of window.
- window_w indicates the width of window.
- C0 indicates the size of C0 of tensor_in.
- SIZE_OF_FP16 indicates the size of float16 data.
Applicability
Example
from tbe import tvm from tbe import dsl shape = (1, 416, 2, 416, 416, 16) input_dtype = "float16" data = tvm.placeholder(shape, name="data", dtype=input_dtype) res = dsl.pooling3d(data, (3, 3, 3), (2, 2, 2), "AVG", "SAME") # res.shape = (1, 208, 2, 208, 208, 16)