1

In Deep Learning and Transfer Learning, does layer freezing offer other benefits other than to reduce computational time in gradient descent?

Assuming I train a neural network on task A to derive weights $W_{A}$, set these as initial weights and train on another task B (without layer freezing), does it still count as transfer learning?

In summary, how essential is layer freezing in transfer learning?

Rexcirus
  • 1,131
  • 7
  • 19
Enk9456
  • 21
  • 3
  • Welcome to this SE :) Be sure to look around on the SE for all the questions that others have posted. The following describes some of the things you are asking for: https://ai.stackexchange.com/questions/22963/what-is-layer-freezing-in-transfer-learning – Robin van Hoorn Feb 23 '23 at 09:49

2 Answers2

1

from your post I assume you are having three sub-questions and I will answer them one by one.

  • For the 1st question, yes, layer freezing reduces the computational cost a lot and also helps the model to keep the many patterns it has learned to recognize things better, c negative to positive samples easier and avoid overfitting if it was trained on a bigger dataset. Said like having a model trained on COCO but now we only want a person detection model, without freezing weights, our model may not generalize as well as it was before due to backprops now just focusing on one class.
  • The second one, yes! Without freezing but still using pretrained model as initial weights is also counted as transfer learning
  • Finally, in summary, layer freezing is a very essential trick to have a good model, especially if you want it to work well in the open-world dataset and generalize for as many domains as possible, time-saving and easier for fine-tuning as we now just have to train the final downstream task layers. Hope this helps :D
GunFire
  • 124
  • 1
1

Both are transfer learning approaches, which this Pytorch tutorial explains very well:

https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html

In practice, very few people train an entire Convolutional Network from scratch (with random initialization), because it is relatively rare to have a dataset of sufficient size. Instead, it is common to pretrain a ConvNet on a very large dataset (e.g. ImageNet, which contains 1.2 million images with 1000 categories), and then use the ConvNet either as an initialization or a fixed feature extractor for the task of interest.

These two major transfer learning scenarios look as follows:

Finetuning the convnet: Instead of random initialization, we initialize the network with a pretrained network, like the one that is trained on imagenet 1000 dataset. Rest of the training looks as usual.

ConvNet as fixed feature extractor: Here, we will freeze the weights for all of the network except that of the final fully connected layer. This last fully connected layer is replaced with a new one with random weights and only this layer is trained.

In summary layer freezing is faster, but it is less accurate (after enough training) than finetuning.

Rexcirus
  • 1,131
  • 7
  • 19