What is the difference between distant supervision and self-supervision?

Question

Weak supervision is supervised learning, with uncertainty in the labeling, e.g. due to automatic labeling or because non-experts labelled the data [1].

Distant supervision [2, 3] is a type of weak supervision that uses an auxiliary automatic mechanism to produce weak labels / reference output (in contrast to non-expert human labelers).

According to this answer

Self-supervised learning (or self-supervision) is a supervised learning technique where the training data is automatically labelled.

In the examples for self-supervised learning, I have seen so far, the labels were extracted from the input data.

What is the difference between distant supervision and self-supervision?

Is it that for self-supervision, the labels must come from the input data and for distant supervision it can come from anywhere (which would make self-supervision a type of distant supervision)?
Or must the labels from distant supervision come from somewhere else than the input data?
If "In robotics, this can be done by finding and exploiting the relations or correlations between inputs coming from different sensor modalities." then for self-supervised learning, the labels do not even have to originate from the input data. (Or did I misinterpret the quote?)

(Setup mentioned in discussion:

Comments are not for extended discussion; this conversation has been [moved to chat](https://chat.stackexchange.com/rooms/109965/discussion-on-question-by-make42-what-is-the-difference-between-distant-supervis). — nbro, Jun 28 '20 at 15:34
I have updated [my answer](https://ai.stackexchange.com/a/10624/2444). Anyway, after having read something about distant supervision, it seems like it refers to learning where the data comes from a remote database rather than a local dataset (and so the term "distant supervision" makes sense). In any case, this term is definitely less common than SSL. — nbro, Aug 04 '20 at 18:03
@nbro: Which articles have you read regarding distant supervision? Maybe I can also have a look. — Make42, Aug 05 '20 at 08:26

score 1 · Answer 1 · answered Jul 04 '20 at 11:36

1

The main difference between distant supervision (as described in the link you provided) and self-supervision lies on the task the network is trained on.

Distant supervision focuses on generating weak labels for the very same task that would be tackled with supervised labels, and the final result could be directly used for that matter.

Self-supervision is a means for learning a data representation. It does so by learning a surrogate task, which is defined by inputs and labels derived exclusively from the original input data.

I can imagine cases in which an implementation of self-supervision could be considered distant supervision (if the task casually matches the target task). On the other hand, if external data sources would be employed for training on a surrogate task, that would be a case of representation learning (that could incorporate self-supervision too).

answered Jul 04 '20 at 11:36

David

511
3
12

Ok, taking https://ai.stackexchange.com/questions/10623/what-is-self-supervised-learning-in-machine-learning/13749?noredirect=1#comment34138_13749 into account, the surrogate task is not the requirement, but learning from the input data is. While distant learning seems to be about learning from external data. I got that much. Now, the example of https://ai.stackexchange.com/a/10624/38174 would then *not* be self-supervised... unless you say that the proximity sensor is part of the "input", but then anything could be "part of the input". – Make42 Jul 04 '20 at 12:16
I would say the input must be the input of the target class, or not? But then, with a surrogate task, the "input" where the labels are extracted from, is *not* the input of the target class. E.g. the documents of word2vec would be the input where the labels are extracted from, but the input of a target task is the (one-hot) word-vector. So "input of target task" has its *origin* of the dataset from which labels are extracted, but it is not the same. Or am I missing something? In variational autoencoder they aren't the same either. In "normal" autoencoders however they are the same. – Make42 Jul 04 '20 at 12:18
Also, a lot of the publications mentioned in https://ai.stackexchange.com/questions/22184/does-self-supervised-learning-require-auxiliary-tasks claim that the auxiliary task the deciding factor. (The more I look into the naming in the machine learning literature, the more frustrating it gets.) – Make42 Jul 04 '20 at 12:21
1) You got it right in the first comment. The robotics example is a bit farfetched, but it's correct as far as the sensor data is part of the input, but only trustworthy in some cases. 2) The input is usually modified in order to generate the auxiliary task e.g. masking a word in order to predict it afterwards. 3) The auxiliary task is key due to its implications in the representation learned e.g. predicting the mean color of an image is probably a very poor choice if you aim to perform facial recognition. – David Jul 04 '20 at 12:36
1) When we say that "self-supervision requires that data labels are extracted from the input data", we need to define what the input data is. Look at the new graphic in the question: Is it the input to the target task (3), the input to the aux. task (2) or a data that lies even further "outside" (1). For word2vec it is (1). For variational autoencoders used for image denoising and my similar task we are actually extracting the (3) from the (1) and the off-the-shelf-data is the *out target* data. But: if (1) is the input data for the definition then *anything* could be part of that input data. – Make42 Jul 04 '20 at 15:12
2) That is supporting my suspection that (1) is the input data. In that case the robotic case would also be self-supervised (if having an auxiliary task is no requirement), because we just say any data is input data in the (1)-sense. 3) The strong connection between representation learned and having an aux. task is obvious. However, if we want to define self-supervision and it is not supposed to be a requirement, we cannot use it in a (mathemathical) definition of self-supervision. And yes, I am searching for a definition that is good enough that it could be made into a math. def. – Make42 Jul 04 '20 at 15:19
@Make42 I really think you're overthinking it. Self-supervised learning is relatively new, and there's probably not a consensus on the definition of SSL. There's also the difference between SSL in robotics and in other contexts, as I emphasize in my answer. In the last edit of my answer, I attempt to give a more general definition of SSL: a machine learning technique that _automatically_ generates a supervisory signal (and I am emphasizing the word _automatically_). – nbro Jul 04 '20 at 20:13

What is the difference between distant supervision and self-supervision?

1 Answers1