Why does the number of feature maps increases in the VGG model?

Question

I found the below image of how a CNN works

But I don't really understand it. I think I do understand CNNs, but I find this diagram very confusing.

My simplified understanding:

Features are selected
Convolution is carried out so that to see where these features fit (repeated with every feature, in every position)
Pooling is used to shrink large images down (select the best fit feature).
ReLU is used to remove your negatives
Fully-connected layers contribute weighted votes towards deciding what class the image should be in.
These are added together, and you have your % chance of what class the image is.

Confusing points of this image to me:

Why are we going from one image of $224 \times 224 \times 3$ to two images of $224 \times 224 \times 64$? Why does this halving continue? What is this meant to represent?
It continues on to $56 \times 56 \times 256$. Why does this number continue to halve, and the number, at the end, the $256$, continues to double?

score 3 · Answer 1 · edited Apr 12 '19 at 18:52

Why are we going from one image of $224 \times 224 \times 3$ to two images of $224 \times 224 \times 64$?

They do a Convolution with a $1 \times 1$ kernel, with $64$ filters. That way, you keep your size as the inputs ($224 \times 224$), but changes the number of filters to 64.

And this is not two image, but two layers !

Why does this halving continue? What is this meant to represent?

This is the maxpooling operation (the layer is in red colour, see the legend). In general, max pooling is applied with a kernel stride of 2 (rarely 3, but not more, because you lose too much informations). That way your size is cut by 2, and the activation maps are more little, and you have faster computation.

It continues on to $56 \times 56 \times 256$. Why does this number continue to halve, and the number, at the end, the $256$, continues to double?

Still max-pooling. There are no reason to double the numbers of filters, though. This just a trend, but you can put the numbers of filters that you want.

score 1 · Answer 2 · answered Jan 17 '23 at 23:31

The number of feature maps is increased because we increased the number of filters that are applied throughout the process.

Here is the reason why:

Filters at first layers would find simple features like lines and edges. Subsequent layers would take those features -in other words, patterns- and combine them to create much bigger patterns.

Therefore we should increase the size of the filters to meet the increased size of complexity. Since the filter size increased, the number of feature maps would increase as a result.

Also, keep in mind that filters at the deeper layers tend to find more complicated features.

Why does the number of feature maps increases in the VGG model?

2 Answers2

Here is the reason why: