Questions tagged [tf-idf]

For questions related to TF-IDF(Term Frequency — Inverse Document Frequency) a technique to quantify a word in documents

TF-IDF (Term Frequency-Inverse Document Frequency) is a statistical measure that evaluates how relevant a word is to a document in a collection of documents. This is done by multiplying two metrics: how many times a word appears in a document, and the inverse document frequency of the word across a set of documents i.e, weight based on its term frequency (TF) and inverse document frequency (IDF). The terms with higher weight scores are considered to be more important.

5 questions

votes

1 answer

Why are documents kept separated when training a text classifier?

Most of the literature considers text classification as the classification of documents. When using the bag-of-words and Bayesian classification, they usually use the statistic TF-IDF, where TF normalizes the word count with the number of words per…

asked Jul 24 '18 at 23:03

freesoul

votes

2 answers

Why do we commonly use the $\log$ to squash frequencies?

Term frequency and inverse document frequency are well-known terms in information retrieval. I am presenting the definitions for both from p:12,13 of Vector Semantics and Embeddings On term frequency Term frequency is the frequency of the word $t$…

natural-language-processing definitions books tf-idf logarithm

asked Jun 13 '21 at 00:04

hanugm

3,571
3
18
50

vote

0 answers

Is there a metric to compare BOW vs TFIDF results?

I am working on a document search task and have used Bag of Words (BOW) and TFIDF vectorization techniques. My observation after going through some sample searches are - Both of them seem to provide similar results when we look at top X results for…

natural-language-processing bag-of-words tf-idf

asked Feb 15 '23 at 06:28

Amit Pathak

votes

1 answer

Distinguishing text with opposite meanings in SVM (False Information Detection)

I am currently working on a Binary Text Classification Model (False Information Detection) using Support Vector Machine and used TF-IDF as text vectorizer in Python. I have already tried training the model but upon testing, I have encountered a…

machine-learning supervised-learning support-vector-machine text-classification tf-idf

asked Mar 02 '22 at 11:17

alexand88r

votes

1 answer

Which data representation of text as input for NLP Deep Learning models?

I have been given a data set with 30.000 text documents (each text file is rather small with respect to its length and consists in most cases of around 20 sentences), which are labelled with 0 or 1. Using this data set, I want to train machine…

machine-learning deep-learning natural-language-processing bag-of-words tf-idf

asked Jan 10 '22 at 19:18

MiFischer22