Questions tagged [test-datasets]

For questions related to test (or testing) datasets in the context of machine learning. A test dataset is any dataset that is not used for training the model but just to evaluate it, in particular, its ability to generalize to unseen data.

16 questions
4
votes
1 answer

What are "development test sets" used for?

This is a theoretical question. I am a newbie to artificial intelligence and machine learning, and the more I read the more I like this. So far, I have been reading about the evaluation of language models (I am focused on ASR), but I still don't get…
3
votes
1 answer

How do I select the (number of) negative cases, if I'm given a set of positive cases?

We were given a list of labeled data (around 100) of known positive cases, i.e. people that have a certain disease, i.e. all these people are labeled with the same class (disease). We also have a much larger amount of data that we can label as…
3
votes
1 answer

What is the reason behind using a test batch size?

If one examines the SSD: Single Shot MultiBox Detector code from this GitHub repository, it can be seen that, for a testing phase (evaluating network on test data set), there is a parameter test batch size. It is not mentioned in the paper. I am not…
1
vote
1 answer

How to perform PCA in the validation/test set?

I was using PCA on my whole dataset (and, after that, I would split it into training, validation, and test datasets). However, after a little bit of research, I found out that this is the wrong way to do it. I have few questions: Are there some…
1
vote
0 answers

Why does the SVM perform poorly on test data that has a different class distribution than the training data?

Do you know why the SVM performs poorly on test data that has a different class distribution than the training data? The training data has around 15 classes, and the additional testing data has around 6 classes (a subset of 15 classes). I found that…
1
vote
1 answer

What does it mean by overfitting the test set?

Consider the following statement from p14 of Naive Bayes and Sentiment Classification While the use of a devset avoids overfitting the test set, having a fixed training set, devset, and test set creates another problem: in order to save lots of…
hanugm
  • 3,571
  • 3
  • 18
  • 50
1
vote
2 answers

Are the held-out datasets used for testing, validation or both?

I came across a new term "held-out corpora" and I confused regarding its usage in the NLP domain Consider the following three paragraphs from N-gram Language Models #1: held-out corpora as a non-train data For an intrinsic evaluation of a language…
1
vote
0 answers

Is there a way, while training (with contrastive learning) the embedding network, to find the test accuracy?

I aim to do action recognition in videos on a private dataset. To compare with the existing state-of-the-art implementations, other guys published their code on Github, like the one here (for the paper Self-supervised Video Representation Learning…
1
vote
1 answer

How to build a test set for a model in industry?

Most of the tutorials only teach us to split the whole dataset into three parts: training set, develop set, and test set. But in the industry, we are kind of doing test-driven development, and what comes most important is the building of our test…
Lerner Zhang
  • 877
  • 1
  • 7
  • 19
1
vote
0 answers

Wouldn't training the model with this data lead to inaccuracies since the testing data would not be normalized in a similar way?

I was trying to normalize my input data images for feeding to my convolutional neural network and wanted to use standardize my input data. I referred to this post, which says that featurewise_center and featurewise_std_normalization scale the images…
0
votes
0 answers

How can validation accuracy be more than test accuracy?

I have been trying to implement DenseNet on small dataset using k-fold cross validation. Training accuracy is 94% ,validation accuracy is 73% whereas test accuracy is 90%.I have taken 10% of my total dataset as test set. I know some overfitting is…
0
votes
2 answers

Why is the WMT16 dataset favoured for evaluating machine translation models?

The Workshop on Statistical Machine Translation has released translation challenges each year from 2004 on, which feature a dataset of sentence pairs in a variety of languages. Even though the conference has been taking place each year, with ever…
Zwiebak
  • 101
0
votes
1 answer

Given a dataset of people with and without cancer, should I split it into training and test datasets such that the same person is not in both?

I have a database that contains healthy persons and lung cancer patients. I need to design a deep neural network for the binary classification problem (cancer/no cancer). I need to split the dataset into 70% train and 30% test. How can I do the…
0
votes
1 answer

What are possible ways to combat overfitting or improve the test accuracy in my case?

I have asked a question here, and one of the comments suggested that this is a case of severe overfitting. I made a neural network, which uses residual boosting (which is done via a KNN), and I am still just able to get < 50% accuracy on the test…
0
votes
0 answers

Why doesn't U-Net work with images different from the dataset?

I have implemented a U-Net, similar to this implementation, but for a different dataset, this one, to segment roads. It works fine using the test folder images, but, for example, when I pick a print from bing maps and try to infer with the trained…
1
2