Is there any recommended way to perform pooling in this context?

Question

Suppose I have three batches of feature maps, each of size $180 \times 100 \times 100$. I want to concatenate all these feature maps channel-wise, and then resize them into a single feature map. The batch size is equal to 10.

Consider the following code in PyTorch

import torch
from torch import nn

x1 = torch.randn(10, 180, 100, 100)
x2 = torch.randn(10, 180, 100, 100)
x3 = torch.randn(10, 180, 100, 100)


pool1 = nn.AvgPool3d(kernel_size = (361, 1, 1), stride= 1)
pool2 = nn.AvgPool3d(kernel_size = 1, stride= (3, 1, 1))

final_1_x = pool1(torch.cat((x1, x2, x3), 1))
final_2_x = pool2(torch.cat((x1, x2, x3), 1))

print(final_1_x.shape)
print(final_2_x.shape)

and its output is

torch.Size([10, 180, 100, 100])
torch.Size([10, 180, 100, 100])

You can observe that both types of polling I did are able to give a feature map of the desired size. But the first one takes a large amount of time with unsatisfactory results and the second one ignores many values in the input feature maps. I don't know whether it is okay to ignore or not.

I want to know the recommended way to perform polling in order to get the desired size of feature maps. Is there any such recommended way to perform pooling?

Why do you want to pool over channels? That's an unusual choice, because channels do not usually have any sequence or metric that would make the operation correspond to something useful — Neil Slater, Oct 12 '21 at 07:46
@NeilSlater That is the reason for asking [this](https://ai.stackexchange.com/questions/32013/what-are-the-recommended-ways-to-change-shape-of-feature-maps-channel-wise-other) question. I want to achieve that without some parametric model, but need to capture information. — hanugm, Oct 12 '21 at 07:53

Neil Slater · Accepted Answer · 2021-10-12T12:31:56.633

This one is a bit crazy:

pool1 = nn.AvgPool3d(kernel_size = (361, 1, 1), stride= 1)

because it averages large numbers of the features at once. Very little information about individual features will remain after doing that.

The most obvious one you have not tried is this:

pool3 = nn.AvgPool3d(kernel_size = (3, 1, 1), stride= (3, 1, 1))

which includes all the feature data, and does not try to average over large amounts of it at a time. I expect pool3 to perform better than pool1 in terms of speed, and better than pool2 in terms of metrics for the trained CNN.

If your goal is to reduce number of feature maps from 540 to 180, then pooling is not usually a good choice of operation. The motivation behind pooling assumes that there is some consistent metric space that it is pooling over - and this is used to decide values of size and stride. The sequence of channels will not usually have such a metric, it is usually arbitrary result of learning.

Instead, the usual way to reduce number of channels between layers is to add a new convolution layer, with the desired number of output channels. In this scenario it is also common to use a kernel size of 1. This adds learnable parameters to the CNN, and those will learn an optimal compression of the channels for your problem.

Is there any recommended way to perform pooling in this context?

1 Answers1