0

According to the original paper on page 4, $224 \times 224 \times 3$ image is reduced to $112 \times 112 \times 64$ using a filter $7 \times 7$ and stride $2$ after convolution.

  • $n \times n = 224 \times 224$
  • $f \times f = 7 \times 7$
  • stride: $s = 2$
  • padding: $p = 0$

The output of the convolution is $(((n+2p-f)/s)+1)$ (according to this), so we have $(n+2p-f)=(224+0-7)=217$, then we divide by the stride, i.e. $217/2=108.5$ (taking the lower value), then we add 1, i.e. $118+1=119$.

How do we get an output image of $112$ now?

nbro
  • 39,006
  • 12
  • 98
  • 176

1 Answers1

1

The padding is not size zero* in the inception CNN layers. In fact it is deliberately chosen to pad so that the convolution by itself would produce an image the same size as the original. I.e. $p=(f−1)/2$, in some libraries this is called "same" padding.

So, $p=3$

The stride is not 2. It is $s=1$ for the convolution. The Inception CNN does not use strided convolutions. Instead the stride of 2 is associated with a later max-pooling layer.

Therefore, using $(((n+2p-f)/s)+1)$ with the correct values $(((224 + 6 - 7)/1)+1 = 224$

Then apply max-pooling, with stride 2. $224/2 = 112$.


* Not to be confused with "zero padding" which means pad using $0$ as the value to insert into the new area. So you can have "zero padding with $p=3$"

Neil Slater
  • 28,678
  • 3
  • 38
  • 60
  • thank you for your opinion. Even with `p`=`1` which is obviously isn't the case. Still doesn't explain `112x112` – Santhosh Dhaipule Chandrakanth Oct 09 '18 at 16:11
  • 1
    @SanthoshDhaipuleChandrakanth: 112 is just half 224 . . . easy to explain after using "same" padding. Stride is 1 for convolution, the stride of 2 is a max pooling layer, it is not a strided convolution done in one step. I have updated the answer. – Neil Slater Oct 09 '18 at 16:17
  • @santhosh: what Neil presents here is not an opinion. It is a fact-based answer to your question. The important difference to an opinion is that fact-based answers can be correct or wrong. In contrast to opinions. Opinions are inherently vague and subject to discussion and cannot be right or wrong. – Martin Thoma Nov 08 '18 at 18:02