References for the theory of pretraining and unsupervised learning to improve subsequent supervised learning

Question

I am not sure if the title of this post uses the correct terminology, so suggestions are welcome.

I have been following a lot of the ideas of using Pre-training methods on neural networks, to improve the accurancy of those networks on subsequent tasks. For example, the Word2Vec paper and others in that same line developed the idea of pretraining word embeddings, as a way to improve subsequent tasks like text translation. Transformer networks themselves being with converting raw data through content and positional embedding layers.

There are other examples of this idea as well. Chris Re at Stanford has explored the idea of "weak supervision" where a neural network is trained on a set of weak or noisy labels--that are cheap to obtain. After training on the weak labels, the network is trained on some higher quality labeled data to gain greater accuracy.

In images, this approach has had a little less success, but still attempted. There has been work on trying to pretrain a neural network on in-painting sections of an image. Once the network is pre-trained, then ML engineers will use that model to try and generate new images, or to improve performance on image segmentation tasks.

So in each of these cases, the model is trained on one task as a way to embed domain knowledge into the network for subsequent tasks. However, I have never really read an explicit explanation or elaboration on this theory. I was wondering if anyone knew of a good reference--paper, article, book, etc.--that discusses this theory and its approaches, limitation, etc.

we call this *transfer learning*. Your search may start here. — lpounng, Feb 28 '23 at 03:43
@lpounng I can see the logic of your answer. I am not sure if transfer learning is really the right word though. I always think of transfer learning as training a supervised model to classify a set of labels on one dataset; then you use that same model to perhaps learn a different set of labels on a different dataset. So it is a supervised -> supervised kind of training. I was thinking more of the relation between unsupervised -> supervised training. But I can include references to transfer learning in my search. — krishnab, Feb 28 '23 at 06:45
If the pretrain phase has no objective, we call it unsupervised learning or dimension reduction. See, isn't that like taking the PCA of a dataset, then use the transformed features to do different predictions? — lpounng, Feb 28 '23 at 09:11

References for the theory of pretraining and unsupervised learning to improve subsequent supervised learning

0 Answers0