Questions tagged [text-classification]

For questions about text classification, the task of assigning predefined categories (or classes) to free-text documents.

60 questions
8
votes
1 answer

Why are documents kept separated when training a text classifier?

Most of the literature considers text classification as the classification of documents. When using the bag-of-words and Bayesian classification, they usually use the statistic TF-IDF, where TF normalizes the word count with the number of words per…
4
votes
2 answers

Does summing up word vectors destroy their meaning?

For example, I have a paragraph that I want to classify in a binary manner. But because the inputs have to have a fixed length, I need to ensure that every paragraph is represented by a uniform quantity. One thing I've done is taken every word in…
4
votes
2 answers

Is there any classifier that works best in general for NLP based projects?

I've written a program to analyse a given piece of text from a website and make conclusary classifications as to its validity. The code basically vectorizes the description (taken from the HTML of a given webpage in real time) and takes in a few…
3
votes
4 answers

Top Frequent occurrence word effect in Model Efficiency?

Assume that I have a Dataframe with the text column. Problem: Classification / Prediction sms_text 0 Go until jurong point, crazy.. Available only ... 1 Ok lar... Joking wif u oni... 2 Free entry in 2 a wkly comp to win FA Cup fina... 3 …
Pluviophile
  • 1,223
  • 5
  • 17
  • 37
3
votes
1 answer

When is it time to switch to deep neural networks from simple networks in text classification problems?

I did an out of domain detection task (as a binary classification problem) and tried LR and Naive Bayes and BERT but the deep neural network didn't perform better than LR and NB. For the LR I just used BOW and it beats the 12-layer BERT. In a…
3
votes
1 answer

How can a system recognize if two strings have the same or similar meaning?

How can a system recognize if two strings have the same or similar meaning? For example, consider the following two strings Wikipedia provides good information. Wikipedia is a good source of information. What methods are available to do this?
3
votes
2 answers

How to use LSTM to generate a paragraph

A LSTM model can be trained to generate text sequences by feeding the first word. After feeding the first word, the model will generate a sequence of words (a sentence). Feed the first word to get the second word, feed the first word + the second…
2
votes
1 answer

How to go about classifying 1000 classes?

I am trying to find research paper with theory(preferably implementation) that is about classifying 1000 (or more) classes. I have heard of an implementation, that initially clustering needs to be done then classification with something like…
2
votes
0 answers

NLP Bible verse division problem: Whats the best model/method?

I'm working on a project compiling various versions of the Bible into a dataset. For the most part versions separate verses discreetly. In some versions, however, verses are combined. Instead of verse 16, the marker will say 16-18. I wonder if,…
2
votes
1 answer

How do RNN's for sentiment classification deal with different sentence lengths?

I have been doing a course which teaches you about Deep Neural Networks, during one of the exercises I was made to make an RNN for sentiment classification which I did, but I did not understand how an RNN is able to deal with sentences of different…
2
votes
2 answers

Is it possible that every class has a higher recall than precision for multi-class classification?

I am a student learning machine learning recently, and one thing is keep confusing me, I tried multiple sources and failed to find the related answer. As following table shows (this is from some paper): Is it possible that every class has a higher…
2
votes
0 answers

Are bayesian neural networks suited for text (or document) classification?

I've tried to do my research on Bayesian neural networks online, but I find most of them are used for image classification. This is probably due to the nature of Bayesian neural networks, which may be significantly slower than traditional artificial…
2
votes
0 answers

Language Learning feedback with AI

Is there a program under development that uses AI technology, like Siri, to "hold hands" so to speak with a language learner and coach them on accent, colloqiual expressions, or to let them guide the language learning process using an archive of…
2
votes
1 answer

How does the weight update formula for logistic regression work?

I am trying to use Logistic Regression to make a spam filter, but I am having trouble understanding the weight update part. I have processed my email dataset, and I have an attribute vector of the top n words that are most likely to be contained…
2
votes
1 answer

Is a dataset of roughly 700 sentences of an average length of 15 words enough for text classification?

I'm building a customer assistant chatbot in Python. So, I am modelling this problem as a text classification task. I have available more or less 7 hundred sentences of an average length of 15 words (unbalanced class). What do you think, knowing…
1
2 3 4