Does anyone have experience with using Cosine Similarity for text classification? I see a number of articles on how to find cosine similarity between documents using Doc2Vec, Gensim, etc.
I have a classification problem (binary) where I want to try out the cosine similarity. I do know how to calculate it, but all the articles that I see only explain until the point of calculating it between two documents.
Right now, I am planning to do this.
Calculate the cosine similarity of 'my paragraph' (the one that I want to classify) with all samples in
classi
(their class is known). Then take the average (call thatavgi
)Calculate the cosine similarity of my paragraph (the one that I want to classify) with all samples in
classo
(their class is known). Then take the average (call thatavgo
)Compare
avgi
andavgo
and then predict the class for 'my paragraph'
That sounds like a very manual way of doing it. Is there some better/widely used way of doing it?