2

I'm implementing a neural network framework from scratch in C++ as a learning exercise. There is one concept I don't see explained anywhere clearly:

How do you go from your last convolutional or pooling layer, which is 3 dimensional, to your first fully connected layer in the network?

Many sources say, that you should flatten the data. Does this mean that you should just simply create a $1D$ vector with a size of $N*M*D$ ($N*M$ is the last convolution layer's size, and $D$ is the number of activation maps in that layer) and put the numbers in it one by one in some arbitrary order?

If this is the case, I understand how to propagate further down the line, but how does backprogation work here? Just put the values in reverse order into the activation maps?

I also read that you can do this "flattening" as a tensor contraction. How does that work exactly?

nbro
  • 39,006
  • 12
  • 98
  • 176
Á. Márton
  • 123
  • 3
  • Can you link where you read 'tensor contraction' to replace flattening? –  May 25 '19 at 13:04

1 Answers1

1

Yes, you are correct (I think it is quite easily implementable in C++ with pointers). The arbitrary order is to be maintained though, since Fully Connected Neural Nets are "Translationally Invariant" i.e. you have to make sure if pixel $(1,5,6)$ is being supplied to node $38$ or being indexed as $37$ as a single datapoint to be input to a Fully Connected Neural Network, then from then on it must be fixed (cannot put say pixel $(1,6,5 )$ in node $38$.

Backpropagation works the same as it always works, it is tough to give a verbal explanation so I will give you this picture:

enter image description here

So, basically if you visualise like this you understand you have to differentiation will propagate, the "flattening" is only reshaping the value lookup table it is not changing the way the values affect final loss, so if you take gradient w.r.t each values and then convert it back to a $3D$ map same way as before and then propagate the gradients as you were doing in previous layers.

  • "So, basically if you visualise like this you understand you have to differentiation will propagate" I dont understand this sentence. In the last paragraph you mean that this flattening operation does not do any "computation" just spatial rearrangement, so the backprogation is just the inversion of this rearrangement? – Á. Márton May 25 '19 at 13:21
  • @Á.Márton this is how differentiation works, Ill link you to the original video. The main thing I am trying to say reshaping something will not change anything except dimensions thus backpropagation through nodes will occur in the same way, you just have to make sure the backprop gradients reach the right place. –  May 25 '19 at 13:24
  • @Á.Márton nope backpropagation has nothing to do with spatial rearranagement. Wait I'll link you a source,askme after watching it –  May 25 '19 at 13:25
  • @Á.Márton https://www.youtube.com/watch?v=d14TUNcbn1k&list=PLC1qU-LWwrF64f4QKQT-Vg5Wr4qEE1Zxk&index=4 –  May 25 '19 at 13:26
  • I'm using cs231n as a reference, I know what backpropagation is in general. I'm asking about this particular case, with the flattening layer: backpropagation here simply means that you rearrange your numbers into the spatial structure before the flattening? – Á. Márton May 25 '19 at 13:37
  • @Á.Márton what has backprop got to do anything with spatial rearrangement? Try to see like this: there is a sieve infront of you through which colored water is flowing, thru each cavity different colored is flowing once it is going forward once backward (backprop). So now you arrange the cavities of the sieve any way you like..Here it is linear or flattening but the forward flow (forward prop) and backward flow remains the same. –  May 25 '19 at 13:45