16

These types of questions may be problem-dependent, but I have tried to find research that addresses the question whether the number of hidden layers and their size (number of neurons in each layer) really matter or not.

So my question is, does it really matter if we for example have 1 large hidden layer of 1000 neurons vs. 10 hidden layers with 100 neurons each?

Stephen Johnson
  • 969
  • 2
  • 8
  • 9

4 Answers4

14

Basically, having multiple layers (aka a deep network) makes your network more eager to recognize certain aspects of input data. For example, if you have the details of a house (size, lawn size, location etc.) as input and want to predict the price. The first layer may predict:

  • Big area, higher price
  • Small amount of bedrooms, lower price

The second layer might conclude:

  • Big area + small amount of bedrooms = large bedrooms = +- effect

Yes, one layer can also 'detect' the stats, however it will require more neurons as it cannot rely on other neurons to do 'parts' of the total calculation required to detect that stat.

Check out this answer

Thomas Wagenaar
  • 1,187
  • 8
  • 11
  • Thank you so much for your answer. Just to clarify, when you write "it makes your network [...]", are you referring to the case when I have many hidden layers with fewer neurons each rather than having more neurons in fewer layers? – Stephen Johnson May 04 '17 at 19:31
  • @StephenJohnson oops, I edited the question. I'm referring to the deep network (multiple layers). – Thomas Wagenaar May 04 '17 at 19:36
  • Nice answer, thanks again. Maybe I should continue this in another thread, but do you think the same kind of reasoning applies to recurrent neural networks such as GRU or LSTM? – Stephen Johnson May 04 '17 at 20:18
  • @StephenJohnson do you mean one layered recurrent networks vs multilayered recurrent networks or do you mean because of their recurrent connections? – Thomas Wagenaar May 04 '17 at 21:20
  • I mean generally, because of they have recurrent connections allowing them to map contexts over longer distances, do such networks benefit from being Deep in the same way a regular feedforward network would? Maybe they can't be compared like that since recurrent networks typically are used when sequential data, such as audio, is used. – Stephen Johnson May 04 '17 at 22:22
  • @StephenJohnson i'm not sure. I just googled "Deep RNN's" and nothing interesting popped up. A reason for this is that RNN's tend to come with predefined architectures (LSTM, GRU, NARX, etc.). And exactly as you say: they are focused on sequential data. But there is one thing: some LSTM's for example have one layer of 5 blocks, and another layer of 5 blocks. So basically deep RNN's can find abstract relationships between **memory blocks** better than single layered RNN's :) – Thomas Wagenaar May 05 '17 at 08:33
  • Let us [continue this discussion in chat](http://chat.stackexchange.com/rooms/58243/discussion-between-stephen-johnson-and-thomas-w). – Stephen Johnson May 05 '17 at 10:04
  • AFAIK in DNN, the interesting fact is that a hierachy of the knowldge is built throw layers. Bottom layer learn basic concepts and upper one higher concepts. With one layer of 1K neurons, you will mainly construct a kind of loop-up table. With a DNN hierarchy of concepts with is much powerful than the previous NN. But his rely on the assumption that the world is like that: high level concepts are build upon basic one. – jcm69 Jun 16 '17 at 00:31
4

There are so many aspects.

1. Training: Training deep nets is a hard job due to the vanishing (rearly exploding) gradient problem. So building a 10x100 neural-net is not recommended.

2. Trained network performance:

  • Information loss: The classical usage of neural nets is the classification problem. Which means we want to get some well defined information from the data. (Ex. Is there a face in the picture or not.) So usually classification problem has a lot of input, and few output, whats more the size of the hidden layers are descend from input to output. However, we loss information using less neurons layer by layer. (Ie. We cannot reproduce the original image based on the fact that is there a face on it or no.) So you must know that you loss information using 100 neurons if the size of the input is (lets say) 1000.
  • Information complexity: However the deeper nets (as Tomas W mentioned) can fetch more complex information from the input data. Inspite of this its not recommended to use 10 fully connected layers. Its recommended to use convolutional/relu/maxpooling or other type of layers. Firest layers can compress the some essential part of the inputs. (Ex is there any line in a specific part of the picture) Second layers can say: There is a specific shape in this place in the picture. Etc etc.

So deeper nets are more "clever" but 10x100 net structure is a good choice.

betontalpfa
  • 141
  • 4
1

If the problem you are solving is linearly separable, one layer of 1000 neurons can do better job than 10 layers with each of 100 neurons. If the problem is non linear and not convex, then you need deep neural nets.

0

I think you have a confusion in the basics of the neural networks. Every layer has a separate activation function and input/output connection weights.

The output of the first hidden layer will be multiplied by a weight, processed by an activation function in the next layer and so on. Single layer neural networks are very limited for simple tasks, deeper NN can perform far better than a single layer.

However, do not use more than layer if your application is not fairly complex. In conclusion, 100 neurons layer does not mean better neural network than 10 layers x 10 neurons but 10 layers are something imaginary unless you are doing deep learning. start with 10 neurons in the hidden layer and try to add layers or add more neurons to the same layer to see the difference. learning with more layers will be easier but more training time is required.

quintumnia
  • 1,183
  • 1
  • 10
  • 34