29

What are "bottlenecks" in the context of neural networks?

This term is mentioned, for example, in this TensorFlow article, which also uses the term "bottleneck values". How does one calculate bottleneck values? How do these values help image classification?

Please explain in simple words.

nbro
  • 39,006
  • 12
  • 98
  • 176
Anurag Singh
  • 401
  • 1
  • 5
  • 11

2 Answers2

38

The bottleneck in a neural network is just a layer with fewer neurons than the layer below or above it. Having such a layer encourages the network to compress feature representations (of salient features for the target variable) to best fit in the available space. Improvements to compression occur due to the goal of reducing the cost function, as for all weight updates.

In a CNN (such as Google's Inception network), bottleneck layers are added to reduce the number of feature maps (aka channels) in the network, which, otherwise, tend to increase in each layer. This is achieved by using 1x1 convolutions with fewer output channels than input channels.

You don't usually calculate weights for bottleneck layers directly, the training process handles that, as for all other weights. Selecting a good size for a bottleneck layer is something you have to guess, and then experiment, in order to find network architectures that work well. The goal here is usually finding a network that generalises well to new images, and bottleneck layers help by reducing the number of parameters in the network whilst still allowing it to be deep and represent many feature maps.

Neil Slater
  • 28,678
  • 3
  • 38
  • 60
  • 1
    Hi @NeilSlater, how is a bottleneck layer different from a bottleneck feature? I have read some kernels on Kaggle and here where people use a pretrained Resnet (after removing its last few layers). The model is made to generate the features without any further training. The features are then passed to a classifier of their choice, usually an SVM or a simpler CNN to learn from those features. Clearly in that case, the network will not learn to compress the feature representations in the available space. – Anshuman Kumar Jun 02 '20 at 14:56
  • Like here the person uses ResNet https://www.kaggle.com/paperboiii/one-class-classification-for-images or here https://towardsdatascience.com/building-the-hotdog-not-hotdog-classifier-from-hbos-silicon-valley-c0cb2317711f – Anshuman Kumar Jun 02 '20 at 14:59
  • @NeilSlater, How much of a difference does compressing feature representations make to the overall accuracy? – Anshuman Kumar Jun 02 '20 at 15:00
  • @AnshumanKumar: I suggest you ask a separate question about that on the site. In brief though, a feature vector (or embedding) that is good for one task is often good for a related task without re-training. It _might_ not be as good as if you trained from scratch, but the difference may not be important to you – Neil Slater Jun 02 '20 at 15:16
  • **in order to get the best loss during training**. How can a bottleneck layer be useful in getting the best loss during training? – hanugm Jan 05 '22 at 23:38
  • @hanugm It is the other way around: The drive to get lower loss (through whatever optimisation is being implied) encourages higher fidellity compression, at least in regards to data from the original dataset that is meaningful in the context of whatever the network is predicting. – Neil Slater Jan 06 '22 at 08:05
  • It may be a good idea to comment on whether this terminology "bottlenecks" is standard. I've seen people use this word "bottleneck" to refer to many other similar concepts (e.g. as a synonym for "limitation") and probably I've also seen it being used in the context of neural networks, but I am not sure if this is standard terminology, so you may want to note that in your answer. For example, if the authors of the Inception papers use this term, it may be a good idea to provide a link to the paper(s). – nbro Jan 06 '22 at 21:56
  • By the way, this is the accepted answer, and [I deleted (now undeleted) one answer that didn't seem to be consistent with your answer](https://ai.stackexchange.com/a/5742/2444) (but was actually consistent with the blog's usage of this term), but the actual blog post that the OP was referring states "_'Bottleneck' is an **informal term** we often use for the **layer just before the final output layer** that actually does the classification. (TensorFlow Hub calls this an "image feature vector".)..._" – nbro Jan 06 '22 at 22:07
  • "_...This penultimate layer has been trained to output a set of values that's good enough for the classifier to use to distinguish between all the classes it's been asked to recognize._". So, it doesn't say that this layer has fewer neurons. That may not be the case or it might be the case. I didn't fully read the article. – nbro Jan 06 '22 at 22:07
  • If you think of architectures like u-net, then your usage/definition of the term "bottleneck" would be different than the definition provided in the article the OP was referring to. The u-net is an auto-encoder-like architecture where the layer with fewer neurons is not the layer just before the output layer. – nbro Jan 06 '22 at 22:12
11

Imagine, you want to re-compute the last layer of a pre-trained model :

Input->[Freezed-Layers]->[Last-Layer-To-Re-Compute]->Output

To train [Last-Layer-To-Re-Compute], you need to evaluate outputs of [Freezed-Layers] multiple times for a given input data. In order to save time, you can compute these ouputs only once.

Input#1->[Freezed-Layers]->Bottleneck-Features-Of-Input#1

Then, you store all Bottleneck-Features-Of-Input#i and directly use them to train [Last-Layer-To-Re-Compute].

Explanations from the "cache_bottlenecks" function of the "image_retraining" example :

Because we're likely to read the same image multiple times (if there are no distortions applied during training) it can speed things up a lot if we calculate the bottleneck layer values once for each image during preprocessing, and then just read those cached values repeatedly during training.

JC R
  • 211
  • 2
  • 4