For questions related to test (or testing) datasets in the context of machine learning. A test dataset is any dataset that is not used for training the model but just to evaluate it, in particular, its ability to generalize to unseen data.
Questions tagged [test-datasets]
16 questions
4
votes
1 answer
What are "development test sets" used for?
This is a theoretical question. I am a newbie to artificial intelligence and machine learning, and the more I read the more I like this. So far, I have been reading about the evaluation of language models (I am focused on ASR), but I still don't get…

little_mice
- 143
- 2
3
votes
1 answer
How do I select the (number of) negative cases, if I'm given a set of positive cases?
We were given a list of labeled data (around 100) of known positive cases, i.e. people that have a certain disease, i.e. all these people are labeled with the same class (disease). We also have a much larger amount of data that we can label as…

Otto
- 33
- 5
3
votes
1 answer
What is the reason behind using a test batch size?
If one examines the SSD: Single Shot MultiBox Detector code from this GitHub repository, it can be seen that, for a testing phase (evaluating network on test data set), there is a parameter test batch size. It is not mentioned in the paper.
I am not…

carobnodrvo
- 143
- 4
1
vote
1 answer
How to perform PCA in the validation/test set?
I was using PCA on my whole dataset (and, after that, I would split it into training, validation, and test datasets). However, after a little bit of research, I found out that this is the wrong way to do it.
I have few questions:
Are there some…

LVoltz
- 121
- 1
- 5
1
vote
0 answers
Why does the SVM perform poorly on test data that has a different class distribution than the training data?
Do you know why the SVM performs poorly on test data that has a different class distribution than the training data? The training data has around 15 classes, and the additional testing data has around 6 classes (a subset of 15 classes). I found that…

Allie
- 11
- 2
1
vote
1 answer
What does it mean by overfitting the test set?
Consider the following statement from p14 of Naive Bayes and Sentiment Classification
While the use of a devset avoids overfitting the test set, having a
fixed training set, devset, and test set creates another problem: in
order to save lots of…

hanugm
- 3,571
- 3
- 18
- 50
1
vote
2 answers
Are the held-out datasets used for testing, validation or both?
I came across a new term "held-out corpora" and I confused regarding its usage in the NLP domain
Consider the following three paragraphs from N-gram Language Models
#1: held-out corpora as a non-train data
For an intrinsic evaluation of a language…

hanugm
- 3,571
- 3
- 18
- 50
1
vote
0 answers
Is there a way, while training (with contrastive learning) the embedding network, to find the test accuracy?
I aim to do action recognition in videos on a private dataset.
To compare with the existing state-of-the-art implementations, other guys published their code on Github, like the one here (for the paper Self-supervised Video Representation Learning…

krishna chaitanya
- 111
- 2
1
vote
1 answer
How to build a test set for a model in industry?
Most of the tutorials only teach us to split the whole dataset into three parts: training set, develop set, and test set. But in the industry, we are kind of doing test-driven development, and what comes most important is the building of our test…

Lerner Zhang
- 877
- 1
- 7
- 19
1
vote
0 answers
Wouldn't training the model with this data lead to inaccuracies since the testing data would not be normalized in a similar way?
I was trying to normalize my input data images for feeding to my convolutional neural network and wanted to use standardize my input data.
I referred to this post, which says that featurewise_center and featurewise_std_normalization scale the images…

user33681
- 11
- 1
0
votes
0 answers
How can validation accuracy be more than test accuracy?
I have been trying to implement DenseNet on small dataset using k-fold cross validation. Training accuracy is 94% ,validation accuracy is 73% whereas test accuracy is 90%.I have taken 10% of my total dataset as test set. I know some overfitting is…

srij
- 13
- 4
0
votes
2 answers
Why is the WMT16 dataset favoured for evaluating machine translation models?
The Workshop on Statistical Machine Translation has released translation challenges each year from 2004 on, which feature a dataset of sentence pairs in a variety of languages.
Even though the conference has been taking place each year, with ever…

Zwiebak
- 101
0
votes
1 answer
Given a dataset of people with and without cancer, should I split it into training and test datasets such that the same person is not in both?
I have a database that contains healthy persons and lung cancer patients. I need to design a deep neural network for the binary classification problem (cancer/no cancer). I need to split the dataset into 70% train and 30% test.
How can I do the…

Noha
- 111
- 2
0
votes
1 answer
What are possible ways to combat overfitting or improve the test accuracy in my case?
I have asked a question here, and one of the comments suggested that this is a case of severe overfitting. I made a neural network, which uses residual boosting (which is done via a KNN), and I am still just able to get < 50% accuracy on the test…

jr123456jr987654321
- 235
- 1
- 7
0
votes
0 answers
Why doesn't U-Net work with images different from the dataset?
I have implemented a U-Net, similar to this implementation, but for a different dataset, this one, to segment roads.
It works fine using the test folder images, but, for example, when I pick a print from bing maps and try to infer with the trained…

FourZeroFive
- 101
- 1