1

I am working on a document search task and have used Bag of Words (BOW) and TFIDF vectorization techniques. My observation after going through some sample searches are -

  • Both of them seem to provide similar results when we look at top X results for a given search term.
  • However, in some cases BOW might give a slightly better top X results compared to TFIDF and vice versa.
  • The cases in which TFIDF is slightly better is comparatively more than cases in which BOW is slightly better.

I wish to select either of the two and based on above eyeballing I decided to go with TFIDF. But this is not explainable since the decision is based on individual perspective after looking at some sample cases. I would like to know if there is some kind of metric that I can make use of to arrive at a decision? Since eyeballing can lead to biased decision.

Amit Pathak
  • 111
  • 2

0 Answers0