Understanding Pooling Layer with Numpy
Pooling Layer

Pooling Layer
Pooling layers are used to reduce the dimensions of the feature maps. Thus, it reduces the number of parameters to learn and the amount of computation performed in the network. The pooling layer summarizes the features present in a region of the feature map generated by a convolution layer.

The pooling layer is a downsampling operation, typically applied after a convolution layer, which does some spatial invariance. In particular, max and average pooling are special kinds of pooling where the maximum and average value is taken, respectively.
- Max pooling: Each pooling operation selects the maximum value of the current view. Max pooling is most commonly used.
- Average pooling: Each pooling operation averages the values of the current view
Pooling Layer vs Convolution Layer
The implementation of the pooling layer is similar to the convolutional layer. Both use im2col to expand the input data.
Let’s see the notation and figure out the difference of them.
- N: the number of images (or mini batch size)
- H: the height of images
- W: the width of images
- FN: the number of filters
- FH: the height of filters
- FW: the width of filters
- C: the depth of channel
- OH: the height of outputs
- OW: the width of outputs
- N: the number of images (or mini batch size)
- H: the height of images
- W: the width of images
- FN: the number of filters
- FH: the height of filters
- FW: the width of filters
- C: the depth of channel
- OH: the height of outputs
- OW: the width of outputs

In the case of convolution, there are parameters with one bias parameter per filter. In most case of convolution, Stride(S) is smaller than FW and FH. A common choice for FN can be 2C.
In the case of pooling, it is independent in the channel direction, which is different from the convolutional layer. Pooling operation is done channel-wise and there is no parameter. In most case of pooling, Stride(S) equals FW and FH.
Forward and Backward Pass for Max Pooling
A max pooling layer is consist of a forward pass with im2col and backward pass with col2im.

Forward Pass

To do forward pass, we are selecting the max values in the receptive fields of the input, saving the indices and then producing a summarized output volume.
We need to find the max index after converting channels to separate images so that im2col can arrange them into separate column.
Backward Pass

When it comes to backward pass, we firstly flatten dout so that we can index using max index from forward pass. After making a matrix with zeros, we fill the max index with the gradient.
col2im method is need in order to convert stretched image to real image.
Application
This is the pooling layer with forward and backward pass that we studied.
Find the output of forward pass and the dx with the input x and dout values below when PH =2, PW =2, padding = 0 and stride = 1.

After reading this post, I hope you learned the intuition on how the pooling layer works. Thank you for reading!