Is transfer learning effective when the new task has more classes than the original?

Question

All examples of transfer learning I have seen for classification use initial weights of a network trained on a larger number of classes (say 1000 in the case of networks trained on ImageNet data) to address a new task that has a smaller number of classes. Can transfer learning be effectively used when the new task has more classes than the original? For example, can I effectively build on ResNet50 or parts of it for a new task that has 1500 classes? Thanks.

score 2 · Answer 1 · edited Nov 27 '22 at 17:59

2

Yes, transfer learning can usually also be utilized when the number of classes differs from the original. Your model will however be more 'transferable' if it has been trained on a wide variety of data or data that is somewhat similar to your new data. What one would usually do is freeze the weights of the lower layers of the network and only retrain the upper part of the network (e.g. the fully connected part). The most important question is, thus, not how many classes each network was trained on but instead: How high is the representational distance between the old and the new dataset?

Recently self-supervised pretraining has increased in popularity, this shows that general features are useful somewhat independent of how they were acquired.

edited Nov 27 '22 at 17:59

Snehal Patel

912
1
1
25

answered Nov 26 '22 at 08:10

Mariusmarten

383
1
16

The emphasis on representational distance concept is apropos. For example, it has been shown that ImageNet trained models do not perform well on medical images. – Snehal Patel Nov 26 '22 at 16:19
2

Also, how many layers in the pre-trained model you decode to freeze could depend on how well you believe the convolutional layers are able to extract the feature of your new images. If your images are a subset of the pre-training images, for example, then you may not need to train any of the convolutional layers. If you are adding novel images, you may also have to train some of the deeper convolutional layers. This can be a "hyperparameter" in your training. – Snehal Patel Nov 26 '22 at 16:29

Is transfer learning effective when the new task has more classes than the original?

1 Answers1