Why skip gram doesnt just use the probabilities as the encoded vector?

Asked Jun 10 '23 at 05:49

Active Jun 10 '23 at 06:07

Viewed 9 times

I am very confused but this is what's on my head now:

The skip-gram algorithm just multiplies hot-encoded-words with a weights-matrix,
and since the word is hot encoded it is just multiplying a row of the matrix
And so this matrix W will end up being the probabilities that we are predicting, or at least will be ordered in the same way.

So what's the advantage of this algorithm? What exactly is figuring out?

Wouldnt just a probability vector for each word, computed from the neighbours in a huge pool of words serve the same purpose without any optimization?

One of the Blogs I read is this one, for example.

edited Jun 10 '23 at 06:07

asked Jun 10 '23 at 05:49

Minsky

Why skip gram doesnt just use the probabilities as the encoded vector?

0 Answers0