For questions about text classification, the task of assigning predefined categories (or classes) to free-text documents.
Questions tagged [text-classification]
60 questions
8
votes
1 answer
Why are documents kept separated when training a text classifier?
Most of the literature considers text classification as the classification of documents. When using the bag-of-words and Bayesian classification, they usually use the statistic TF-IDF, where TF normalizes the word count with the number of words per…

freesoul
- 246
- 1
- 5
4
votes
2 answers
Does summing up word vectors destroy their meaning?
For example, I have a paragraph that I want to classify in a binary manner. But because the inputs have to have a fixed length, I need to ensure that every paragraph is represented by a uniform quantity.
One thing I've done is taken every word in…

Arnav Das
- 101
- 4
4
votes
2 answers
Is there any classifier that works best in general for NLP based projects?
I've written a program to analyse a given piece of text from a website and make conclusary classifications as to its validity. The code basically vectorizes the description (taken from the HTML of a given webpage in real time) and takes in a few…

Arnav Das
- 101
- 4
3
votes
4 answers
Top Frequent occurrence word effect in Model Efficiency?
Assume that I have a Dataframe with the text column.
Problem: Classification / Prediction
sms_text
0 Go until jurong point, crazy.. Available only ...
1 Ok lar... Joking wif u oni...
2 Free entry in 2 a wkly comp to win FA Cup fina...
3 …

Pluviophile
- 1,223
- 5
- 17
- 37
3
votes
1 answer
When is it time to switch to deep neural networks from simple networks in text classification problems?
I did an out of domain detection task (as a binary classification problem) and tried LR and Naive Bayes and BERT but the deep neural network didn't perform better than LR and NB. For the LR I just used BOW and it beats the 12-layer BERT.
In a…

Lerner Zhang
- 877
- 1
- 7
- 19
3
votes
1 answer
How can a system recognize if two strings have the same or similar meaning?
How can a system recognize if two strings have the same or similar meaning?
For example, consider the following two strings
Wikipedia provides good information.
Wikipedia is a good source of information.
What methods are available to do this?

John Hank
- 31
- 1
3
votes
2 answers
How to use LSTM to generate a paragraph
A LSTM model can be trained to generate text sequences by feeding the first word. After feeding the first word, the model will generate a sequence of words (a sentence). Feed the first word to get the second word, feed the first word + the second…

Dee
- 1,283
- 1
- 11
- 35
2
votes
1 answer
How to go about classifying 1000 classes?
I am trying to find research paper with theory(preferably implementation) that is about classifying 1000 (or more) classes. I have heard of an implementation, that initially clustering needs to be done then classification with something like…

Naveen Reddy Marthala
- 205
- 2
- 10
2
votes
0 answers
NLP Bible verse division problem: Whats the best model/method?
I'm working on a project compiling various versions of the Bible into a dataset. For the most part versions separate verses discreetly. In some versions, however, verses are combined. Instead of verse 16, the marker will say 16-18. I wonder if,…

rwreed
- 121
- 2
2
votes
1 answer
How do RNN's for sentiment classification deal with different sentence lengths?
I have been doing a course which teaches you about Deep Neural Networks, during one of the exercises I was made to make an RNN for sentiment classification which I did, but I did not understand how an RNN is able to deal with sentences of different…

jr123456jr987654321
- 235
- 1
- 7
2
votes
2 answers
Is it possible that every class has a higher recall than precision for multi-class classification?
I am a student learning machine learning recently, and one thing is keep confusing me, I tried multiple sources and failed to find the related answer.
As following table shows (this is from some paper):
Is it possible that every class has a higher…

Cheleeger Ken
- 73
- 5
2
votes
0 answers
Are bayesian neural networks suited for text (or document) classification?
I've tried to do my research on Bayesian neural networks online, but I find most of them are used for image classification. This is probably due to the nature of Bayesian neural networks, which may be significantly slower than traditional artificial…

Nicole
- 21
- 1
2
votes
0 answers
Language Learning feedback with AI
Is there a program under development that uses AI technology, like Siri, to "hold hands" so to speak with a language learner and coach them on accent, colloqiual expressions, or to let them guide the language learning process using an archive of…

Tristan Beckwith
- 21
- 3
2
votes
1 answer
How does the weight update formula for logistic regression work?
I am trying to use Logistic Regression to make a spam filter, but I am having trouble understanding the weight update part. I have processed my email dataset, and I have an attribute vector of the top n words that are most likely to be contained…

kostas
- 31
- 2
2
votes
1 answer
Is a dataset of roughly 700 sentences of an average length of 15 words enough for text classification?
I'm building a customer assistant chatbot in Python. So, I am modelling this problem as a text classification task. I have available more or less 7 hundred sentences of an average length of 15 words (unbalanced class).
What do you think, knowing…

Alfonso
- 65
- 4