Has "deep vs. wide" been resolved?

Question

All else being equal, including total neuron count, I give the following definitions:

wide is a parallel ensemble, where good chunks of the neurons have the same inputs because the inputs are shared and they have different outputs.
deep is a series ensemble, where for the most part neurons have as input the output of other neurons and few inputs are shared.

For CART ensembles the parallel (wide) ensemble is a random forest while the series (deep) ensemble is a gradient boosted machine. For several years the GBM was the "winningest" on kaggle.

Is there a parallel of that applied to Neural networks? Is there some reasonable measure that indicates whether deep outperforms wide when it comes to neural networks? If I had the same count of weights to throw at a tough problem, all else being equal should they be applied more strongly in parallel or in series?

Deep performs better because of multilevel feature extraction. At least the prevailing thought is that are hierarchy of features is implicitly extracted. — FourierFlux, Oct 21 '20 at 18:02

score 2 · Answer 1 · answered Oct 22 '20 at 07:48

I am not sure what you are really looking for but I leave here this paper here, where some intuition into that direction is provided. This paper compares the performance of a deep learning model scaling in 3 dimensions: resolution, width and depth. As depicted in their definition:

If you go to section 3.2 you will see how scaling the different dimensions independently (resolution, width and depth) affects performance and how by performing a compound scaling the performance of the model is maximized (so there is a close relation). It is a very thorough ablation study. For me this was the paper where I understood how these Width, Depth and Resolution parameters come together.

Rethinking Model Scaling for Convolutional Neural Network: https://arxiv.org/abs/1905.11946

Has "deep vs. wide" been resolved?

1 Answers1