Questions tagged [imbalanced-datasets]

For questions that involve imbalanced (or unbalanced) datasets.

25 questions
7
votes
2 answers

Does data skew matter in classification problem?

I'm working on an image classification problem using a neural network. In the training data set, 90% of the samples fall into 10% of all categories, while 10% of the sample fall into the other 90% categories. So an example is not evenly distributed…
3
votes
1 answer

How do I select the (number of) negative cases, if I'm given a set of positive cases?

We were given a list of labeled data (around 100) of known positive cases, i.e. people that have a certain disease, i.e. all these people are labeled with the same class (disease). We also have a much larger amount of data that we can label as…
2
votes
1 answer

How can I use Generative Adversarial Networks to solve the imbalanced class problem?

Problem setting We have to do a binary classification of data given a training dataset $D$, where most items belong to class $A$ and some items belong to class $B$, so the classes are heavily imbalanced. Approach We wanted to use a GAN to produce…
2
votes
1 answer

What are the possible ways to handle imbalance in multi-class image datasets?

Image imbalance is one of the major factor in the performance of DL model. Some of the methods that I found to tackle this are oversampling, under-sampling, SMOTE. Over-sampling has cons as it makes model to be overfit.undersampling results in loss…
2
votes
1 answer

How do you handle unbalanced image datasets?

I have an image data set on which I am training a CNN. The data set is slightly unbalanced. So, my solution up till now was to delete some images of the majority class. But I now realize that there are cleaner ways to deal with this. But I haven't…
2
votes
1 answer

How to handle class imbalance when the actual data are that way

My supervised learning training data are obtained from actual data; and in real cases, there's one class that happens less often than other classes, just around 5% of all cases. To be precise, the first 2 classes are in 95% of training data and the…
2
votes
1 answer

How robust are deep networks to class imbalance?

Before deep learning, I worked with machine learning problems where the data had a large class imbalance (30:1 or worse ratios). At that time, all the classifiers struggled, even after under-sampling the represented classes and creating synthetic…
2
votes
2 answers

How to perform binary classification when one class is more predominant than the other?

Assuming we have big $m \times n$ input dataset, with $m \times 1$ output vector. It's a classification problem with only two possible values: either $1$ or $0$. Now, the problem is that almost all elements of the output vector are $0$s with a very…
1
vote
1 answer

Fine tuning a Deep Learning model post training

I have trained a CNN in a binary classification problem, however the original problem has 6 different classes, of which, I am only interested in classifying one, so if it is that certain class or not.in this case, let's say class 2. After looking…
NeuroEng
  • 121
  • 4
1
vote
0 answers

Training with extremely imbalanced Dataset

I have a object detection problem which has extremely imbalanced dataset. Lets say there is only one class to detect, say apple or not apple. This detection network will be used in a real case including IP camera streaming where positive/negative…
1
vote
0 answers

Data Imbalance in Contextual Bandit with Thompson Sampling

I'm working with the Online Logistic Regression Algorithm (Algorithm 3) of Chapelle and Li in their paper, "An Empirical Evaluation of Thompson Sampling" (https://papers.nips.cc/paper/2011/file/e53a0a2978c28872a4505bdb51db06dc-Paper.pdf). It's a…
1
vote
1 answer

Handling imbalanced data with multiple targets

I have the model which has 3 outputs (it is a regression task, I have the angle of the steering wheel, brake and acceleration). I can divide my values to some smaller bins and in this way I can change this into classification problem. I can balance…
user40943
1
vote
1 answer

Multi class text classification when having only one sample for classes

I have a dataset of texts, each text was identified with an ID number. I would like to do a prediction by finding the best match ID number for upcoming new texts. To use multi text classification, I am not sure if this is the right approach since…
1
vote
1 answer

Is it possible to combine k-fold cross-validation and oversampling for a multi-class text classification task with imbalanced data?

I am dealing with an intent classification task on an Italian customer service data set. I've more or less 1.5k sentences and 29 classes (imbalanced). According to the literature, a good choice is to generate synthetic data, oversampling, or…
0
votes
0 answers

How to balance classes for YOLO?

The problem I am having is that to my understanding we need to annotate all objects of all classes on the images we want to train (or fine tune) our YOLO on. This is because YOLO compares labeled classes against other parts of the image, so if the…
1
2