For questions related to validation datasets, which are used for hyper-parameter optimization, early stopping or cross-validation. They are sometimes referred to as held-out datasets, but this latter term could also be used to refer to test datasets (i.e. the datasets used to assess the generalisation of the model).
Questions tagged [validation-datasets]
9 questions
4
votes
1 answer
What are "development test sets" used for?
This is a theoretical question. I am a newbie to artificial intelligence and machine learning, and the more I read the more I like this. So far, I have been reading about the evaluation of language models (I am focused on ASR), but I still don't get…

little_mice
- 143
- 2
2
votes
3 answers
Why does MNIST provide only a training and a test set and not a validation set as well?
I was taught that, usually, a dataset has to be divided into three parts:
Training set - for learning purposes
Validation set - for picking the model which minimize the loss on this set
Test test - for testing the performance of the model picked…

tail
- 147
- 6
2
votes
2 answers
Why not make the training set and validation set one if their roles are similar?
If the validation set is used to tune the hyperparameters and the training set adjusts the weights, why don't they be one thing as they have a similar role, as in improving the model?

Omar Zayed
- 43
- 4
2
votes
1 answer
What is the difference between validation percentage and batch size?
I'm doing transfer learning using Inception on Tensorflow. The code that I used for training is https://raw.githubusercontent.com/tensorflow/hub/master/examples/image_retraining/retrain.py
If you take a look at the Argument Parser section at the…

gameon67
- 215
- 3
- 12
1
vote
1 answer
How to perform PCA in the validation/test set?
I was using PCA on my whole dataset (and, after that, I would split it into training, validation, and test datasets). However, after a little bit of research, I found out that this is the wrong way to do it.
I have few questions:
Are there some…

LVoltz
- 121
- 1
- 5
1
vote
1 answer
Datasets input at model.fit produce unexpected results of training loss vs validation loss
Im trying to train a neural network (VAE) using tensorflow and Im getting different results based on the type of input in the model.fit.
When I input arrays I get normal difference between the validation loss and the total loss.
When I input a…
user56546
1
vote
2 answers
Are the held-out datasets used for testing, validation or both?
I came across a new term "held-out corpora" and I confused regarding its usage in the NLP domain
Consider the following three paragraphs from N-gram Language Models
#1: held-out corpora as a non-train data
For an intrinsic evaluation of a language…

hanugm
- 3,571
- 3
- 18
- 50
1
vote
1 answer
What is the theoretical basis for the use of a validation set?
Let's say we use an MLE estimator (implementation doesn't matter) and we have a training set. We assume that we have sampled the training set from a Gaussian distribution $\mathcal N(\mu, \sigma^2)$.
Now, we split the dataset into training,…
user9947
-1
votes
1 answer
how to decide the optimum model?
I have split the database available into 70% training, 15% validation, and 15% test, using holdout validation. I have trained the model and got the following results: training accuracy 100%, validation accuracy 97.83%, test accuracy 96.74%
In…

user50778
- 1
- 1