0

I am very confused but this is what's on my head now:

  1. The skip-gram algorithm just multiplies hot-encoded-words with a weights-matrix,
  2. and since the word is hot encoded it is just multiplying a row of the matrix
  3. And so this matrix W will end up being the probabilities that we are predicting, or at least will be ordered in the same way.

So what's the advantage of this algorithm? What exactly is figuring out?

Wouldnt just a probability vector for each word, computed from the neighbours in a huge pool of words serve the same purpose without any optimization?

One of the Blogs I read is this one, for example.

Minsky
  • 101
  • 2

0 Answers0