1

All else being equal, including total neuron count, I give the following definitions:

  • wide is a parallel ensemble, where good chunks of the neurons have the same inputs because the inputs are shared and they have different outputs.
  • deep is a series ensemble, where for the most part neurons have as input the output of other neurons and few inputs are shared.

For CART ensembles the parallel (wide) ensemble is a random forest while the series (deep) ensemble is a gradient boosted machine. For several years the GBM was the "winningest" on kaggle.

Is there a parallel of that applied to Neural networks? Is there some reasonable measure that indicates whether deep outperforms wide when it comes to neural networks? If I had the same count of weights to throw at a tough problem, all else being equal should they be applied more strongly in parallel or in series?

EngrStudent
  • 361
  • 3
  • 12
  • Deep performs better because of multilevel feature extraction. At least the prevailing thought is that are hierarchy of features is implicitly extracted. – FourierFlux Oct 21 '20 at 18:02

1 Answers1

2

I am not sure what you are really looking for but I leave here this paper here, where some intuition into that direction is provided. This paper compares the performance of a deep learning model scaling in 3 dimensions: resolution, width and depth. As depicted in their definition: enter image description here

If you go to section 3.2 you will see how scaling the different dimensions independently (resolution, width and depth) affects performance and how by performing a compound scaling the performance of the model is maximized (so there is a close relation). It is a very thorough ablation study. For me this was the paper where I understood how these Width, Depth and Resolution parameters come together.

Rethinking Model Scaling for Convolutional Neural Network: https://arxiv.org/abs/1905.11946

JVGD
  • 1,088
  • 1
  • 6
  • 14