Does layer freezing offer other benefits other than to reduce computational time in gradient descent?

Question

In Deep Learning and Transfer Learning, does layer freezing offer other benefits other than to reduce computational time in gradient descent?

Assuming I train a neural network on task A to derive weights $W_{A}$, set these as initial weights and train on another task B (without layer freezing), does it still count as transfer learning?

In summary, how essential is layer freezing in transfer learning?

Welcome to this SE :) Be sure to look around on the SE for all the questions that others have posted. The following describes some of the things you are asking for: https://ai.stackexchange.com/questions/22963/what-is-layer-freezing-in-transfer-learning — Robin van Hoorn, Feb 23 '23 at 09:49

score 1 · Answer 1 · answered Feb 21 '23 at 18:37

from your post I assume you are having three sub-questions and I will answer them one by one.

For the 1st question, yes, layer freezing reduces the computational cost a lot and also helps the model to keep the many patterns it has learned to recognize things better, c negative to positive samples easier and avoid overfitting if it was trained on a bigger dataset. Said like having a model trained on COCO but now we only want a person detection model, without freezing weights, our model may not generalize as well as it was before due to backprops now just focusing on one class.
The second one, yes! Without freezing but still using pretrained model as initial weights is also counted as transfer learning
Finally, in summary, layer freezing is a very essential trick to have a good model, especially if you want it to work well in the open-world dataset and generalize for as many domains as possible, time-saving and easier for fine-tuning as we now just have to train the final downstream task layers. Hope this helps :D

score 1 · Answer 2 · answered Feb 21 '23 at 21:08

Both are transfer learning approaches, which this Pytorch tutorial explains very well:

https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html

In practice, very few people train an entire Convolutional Network from scratch (with random initialization), because it is relatively rare to have a dataset of sufficient size. Instead, it is common to pretrain a ConvNet on a very large dataset (e.g. ImageNet, which contains 1.2 million images with 1000 categories), and then use the ConvNet either as an initialization or a fixed feature extractor for the task of interest.

These two major transfer learning scenarios look as follows:

Finetuning the convnet: Instead of random initialization, we initialize the network with a pretrained network, like the one that is trained on imagenet 1000 dataset. Rest of the training looks as usual.

ConvNet as fixed feature extractor: Here, we will freeze the weights for all of the network except that of the final fully connected layer. This last fully connected layer is replaced with a new one with random weights and only this layer is trained.

In summary layer freezing is faster, but it is less accurate (after enough training) than finetuning.

Does layer freezing offer other benefits other than to reduce computational time in gradient descent?

2 Answers2